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(54) Title: GENES AMPLIFIED IN CANCER CELLS 
(57) Abstract 

New methods are disclosed for detecting cancer associated 
genes, and obtaining correspcmding cDNA sequences. Hie 
methods involve supplying RNA preparations from control cells, 
and from a plurality of different cancer cells that share a 
duplicated or deleted gene m the same region of a chmrnpsome. 
Amplified dmA copies are displayed, and then selected based 
on differences fai abundance of RNA between preparations. 
Optional additional screening steps involve surveying panels of 
cancer cells using the cDNA for RNA overabundance with or 
without gene duplication. Hie identified genes can be used 
in turn to develop materials and techniques for diagnosing and 
treating the underlying cancer. Four novd genes associated with 
cancer have been identified. In at least about 60 % of the breast 
cancer cell lines tested. RNA hybridizmg with the cDNAs were 
substantially more abundant than in normal cells. Most of the 
cell lines also showed a duplication of die coiresponding gene, 
which probably contributed to tlie increased level of RNA in the 
cell. However, for each of the four genes, diere were some cell 
lines which had RNA overabundance without gene dupllcadon. 
Hib suggests that the gene product is sufficiently important 
to the cancer process that cells will use several alternative 
mechanisms to achieve increased e}q>ression. 
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Genes Amplified im Cancer Cells 

PRIQRmr CLAIM 

5 This application claims the priority Isenefit of the following U.S. Patent applicatiohs: 

60/015.167. filed April 9. 1996; 60/019.202, filed June 6, 1996; 08/678,280. filed July 10. 1996. For 
l^urposes of prosecution in ttie U.S., the afDrennentioned appncafions are hereby incorporated herein 
lyy reference In their entirety. 

10 TECHNiCAUFlELP 

The present invention relates generally to the field of hunnan genetics. More specifically, it 
relates to the identification of liovel genes associated with overabundance of RNA in human cancer 
such as breast cancer. It pertains especially to ttiose genes and ttie products tiieneof which may be 
1 5 important in diagnosis and treatment. 

Background of the tMvgiiTiQii 

Cancer is a heterogeneous disease. Itmanifestsitself In a wide variety of tissue sites, with 
20 different degrees of de-diff^rentiation, invasiveness, and aggressiveness. Some forms of cancer 

are responsive to tradittonal modes of ttierapy. but many are not For most common cancers, there 

is a pressing need to improve the arsenal of therapies avatet>le to provide more precise and more 

effective treatment in a less invasive way. 

As an example, breast cancer has an unsatisfactory nK)rbidity and nru^rtafity. despite 
25 presently available forms of medical Intervention. Traditional clinical initiatives are focused on early 

diagnosis, followed by surgery and chemotherapy. Such interventions are of limited success. 

particularly In patients where the tumor has undergone metastasis. 

The heterogeneous nature of cancer arises because different cancer cells achieve their 

growtti and patiioiogical properties by different phenotypic alterations. Alteration of gene 
30 expression is Intimately related to tiie uncontrolled growth and de-differentiation that are hallmarks 

of cancer Certain similar phenotypic alterations in turn nnay have a different genetic base in 

different tunrK>rs. Yet. the number of genes central b the malignant process nrujst be a finite one. 

Accordingly, new phamnaceuticals ttiat are tailored to specific genetic alterations in an individual 

tunxir nnay be more effective. 
35 There are two types of altered gene expression that take place, together or independentiy, 

In different cancer cells (reviewed by Bishop). The first type is ttie decreased expression of 

recessive genes, known as tunK>r suppresser genes, that apparentiy act to prevent malignant 

growth. The second type is the increased expression of dominant genes, such as oncogenes, that 
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act to promote maUgnant growth, or to provide some other phenotype critical for malignancy. Thus. 
attBration in the expression of either type of gene is a potential diagnostic indicator. Furthermore, a 
treatment strategy might seek to reinstate the expression of suppresser genes, or reduce the 
expression of dominant genes. The present invention is directed to identifying genes of either type, 
particularly those of the second fype. 

The most frequently studied mechanism for gene overexpression in cancer cells is 
sonietimes refen^ to as amplification. This is a process whereby the gene is duplicated within the 
chromosomes of the ancestral cell into multiple copies. The process involves unscheduled 
replications of the region of the chromosome comprising the gene, followed recombination of the 
replicated segments back into the chromosome (Alitato et al.). As a result. 50 or more copies of 
the gene may be produced. The duplkated region is sometimes referred to as an -amplicon". The 
level of expression of the gene (that is. the amount of messenger RNA produced) escalates in the 
tiansfonned cell in the same proportion as the number of copies of the gene that are made (Alitalo 
etaL). 

15 Several human oncogenes have been described, some of whk:h are duplicated, for 

example. In a significant proportion of breast tumors. A prototype is the ert)B2 gene (also known 
as HER-2fneu), which encodes a 185 kDa membrane growth factor receptor homologous to the 
epidermal growth factor receptor. erf)B2 is duplteated in 61 of 283 tumors (22%) tested in a recent 
survey (Adnane et al ). Other oncogenes duplicated in breast cancer are the bek gene, duplteated 

20 in 34 out of 286 (12%); the fl& gene, duplicated in 37 out of 297 (12%). the myc gene, duplteated in 
43 out of 275 (16%) (Adnane et al). 

Work with other oncogenes, partteularly those described for neuroblastoma, suggested that 
gene duplteatton of the proto-oncogene was an event involved in the more malignant forms of 
cancer, and could act as a predictor of ciinteal outeome (reviewed by Schwab et al. and Alitalo et 

25 al.). in breast cancer, duplteatten of the erf>B2 gene has been reported as correlating both with 
reoccurrence of the disease and decreased survival times (Siamon et al.). There is some evklence 
that ert>B2 helps Mentify tumors that are responsive to accent chemotherapy with 
cydophosphamkle. doxoruUcin. and fluorouradi (Muss et al.). 

It is clear that only a proportion of the genes that can undergo gene duplicatton in cancer 

30 have been identified. First, chromosome abnomnalities. such as double minute (DM) chromosomes 
and homogeneously stained regions (HSRs). are abundant in cancer cells. HSRs are 
chromosomal regions that appear in karyotype analysis with intermediate density Giemsa staining 
throughout their length, rather than with the normal pattern of alternating dark and light bands. 
They correspond to muHiple gene repeats. HSRs are particularly abundant in breast cancers. 

35 showing up in 6045% of tumors sun^eyed (Dutriliaux et al.. Zafrani et al.). When such regtons are 
dhedwd by in situ hybrklization with probes for any of 16 known human oncogenes, including 
erbB2 and myc. only a proportron of tumors show any hybridization to HSR regions. Furthermore, 
only a proportion of the HSRs within esnh karyotype are Implicated. 
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Second, comparative genomic hybridization (CGH) has revealed the presence of copy 
mmbST increases In tumors, even in chromosomal regions outside of HSRs. CGH is a new 
method in which whole chromosome spreads are stained simultaneously with DNA fragments from 
normal cells and from cancer cells, using two different fluorochromes. The images are 

6 computer-processed for the fluorescence raUo. revealing chromosomal regions that have 
undergone amplification or deletion in the cancer cells (KalBonlemi et al. 1992). This method was 
recently applied to 15 breast cancer cell lines (Kallionlemi et al. 1994). DNA sequence copy 
number increases were detected in all 23 chromosome pairs. 

Cloning the genes that undergo duplication in cancer is a Ibmildable diallenge. In one 

10 approach, human oncogenes have been identified by hybridizing with probes for other known 
growth-promoting genes, particularty known oncogenes In other species. For example, the erbB2 
gene was Identified using a probe from a chemically Induced rat neuroglioblastoma (Slamon et al.). 
Genes with novel sequences and functions will evade this type of search. In another approach, 
genes may be cloned from an area identified as containing a duplicated region by CGH method. 

15 Since CGH is able to indicate only the approximate chromosomal regton of duplicated genes, an 
extensive amount of experimentation is required to walk through the entire region and identify the 

particular gene involved. 

Genes may also be overexpressed in cancer without being dupHcated. Methods that rely 
on identification from genetic abnormalities necessarily bypass such genes. Increased expresston 
20 can come about through a higher level of transcription of the gene; for example, by up-regulation of 
the promoter or substitution with an alternative promoter. It can also occur if the transcription 
product Is able to pereist tenger in the cell; Ibr example, by increasing the resistance to cytoplasmic 
RNase or 1^ reducing the level of such cytoplasnric enzymes. Two examples are the epidermal 
growth factor receptor, overexpressed in 45% of breast cancer tunwrs (Wijn et al.), and the IGF-1 
25 receptor, overexpressed in 50-93% of breast cancer tumors (Bems et al.). In almost all cases, the 
overexpression of each of these receptors is by a mechanism other than gene duplication. 

One way of examining overexpression at the messenger RNA level is by subtractive 
hybridization. It involves producing positive and negative cDNA strands from two RNA 
preparations, and looking for cDNA vtrtiich is not completely hybridized by the opposing preparatton. 
30 This is a laborious procedure whteh has distinct limitations in cancer research. In particular, since 
each subtracdon involves cDNA from only two cell populations at a time, it is sensitive to Individual 
phenotypfc differences due not just to the presence of cancer, but also through natural metaboKc 
variatkMis. 

Another way of examining overexpression at the messenger RNA level is by differential 
35 display (Liang et al. 1992a). In this technique, cDNA is prepared from only a subpopulation of each 
RNA preparation, and expanded via the polymerase chain reaction using primers of particular 
specificity. Similar subpopulattons are compared across several RNA preparations by gel 
autoradiography for expression differences. In order to survey the RNA preparations entirely, the 
assay is repeated with a comprehensive set of PCR primers. The screening strategy more 
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effectiveiy includes multiple positive and negative control samples (Sunday et al.). The method has 
recentty been applied to breast cancer cell lines, and highlights a number of expression differences 
(Liang et al. 1992b; Chen et al., McKenzie et al.. Watson et al. 1994 & 1996. Kocher et ai.). By 
excising the corresponding region of the separating gel. it is possible to recover and sequence the 
5 cDNA. 

Despite the advancement pn>vided by differential display, problems remain in terms of 
applying it in the search for new cancer genes. First, because this is a test for RNA levels, any 
phenotypic difference between cell lines constitute part of the recovered set. leading to a large 
proportion of "false positive" identifications . it has been found that cDNA for mitochondrial genes 

10 constitute a large proportion of the differentially expressed bands, and it consumes sut>stant!al 
resources to recover the sample and obtain a partial sequence in order to eliminate them. Second, 
false positive identifications are made for reasons attributed to multiple cDNA species and 
competition for the PGR primers by RNA species of different abundance (Debouck). Third, 
differential display highlights high copy number mRNAs and shorter mRNAs (BertioU et al.. 

15 Yeatman et al.) . and may therefore miss critical cancer-associated transcripts when used as a 
survey technique. Fourth, a number of adjustments are made to gene expression levels when a 
cell undergoes malignant transformation or cultured in vitro. Most of these adjustments are 
secondary, and not part of the transformation process. Thus, even when a novel sequence is 
obtained from the different^l display, it is far from certain that the con-esponding gene is at the root 

20 of the disease process. 

An early step in developing gene-specific therapeutic approaches is the identification of 
genes ttiat are nnore central to malignant transformation or tiie persistence of tiie malignant 
phenotype. 

25 DISCLOSURE QF THE INVENTION 

It is an objective of this invention to provide a mettiod for identifying and characterizing 
genes and gene products which are duplicated or associated with overabundant RNA in cancer 
cells. The method can be used for any type of cancer, providing a plurality of cell populations or 

30 cell lines of the type of cancer are available, in conjunction with a suitable control cell population. 
The method is highly effective in identifying genes and gene products that are intimately related to 
malignant transformation or maintenance of the malignant properties of the cancer cells. 

An important derivative of applying ttie mettiod is ttie selection and retrieval of cDNA and 
cDNA fragments corresponding to the cancer-associated gene. These fragments can be used 

35 inter alia to determine the nucleotide sequence of the gene and mRNA. tirie amino acid sequence of 
any encoded protein, or to retrieve from a cDNA or genomic library additional polynucleotides 
related to the gene or its transcripts. Since ti\e genes are typically involved in the malignant 
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process of the cell, the polynucleotides, polypeptides, and antllwdies derived liy using this method 
can in turn be used to design or screen important diagnostic reagents and therapeutic compounds. 

Another objective of this Invention to provide isolated polynucleotides, polypeptides, and 
antibodies derived from four novrf genes which are associated with several different types of cancer, 
5 including breast cancer The genes are designated CH1-9a11-2, .CH8-2a13-1, CH13-2a12-r and 
CH14-2a16-1. These designations refer to both strands of the cDNA and fragments thereof, and to 
the respective corresponding messenger RNA. including splice variants, allelic variants, and 
fragments of any of ttiese forms. These genes show RNA overabundance in a majority of cancer cell 
lines tested. A majority of the cells showing HHA overabundance also have dupHcation of the 
10 corresponding gene. Another object of this invention is to provide materials and methods based on 
these polynucleotides, polypeptides, and antibodies for use in the diagnosis and treatment of cancer, 
particutariy breast cancer. 

Accordingly, one embodiment of this invention is an isolated polynucleotide comprising a 
Unear sequence contained in a polynucleotide selected from the group consisting of CH1-9a11-2, 
15 CH8-2a13-1, CH13-2a12-1, and CH14-2a16-1. The linear sequence Is contained in a duplicated 
gene or overabundant RNA in cancerous cells. The RNA may be overabundant due to gene 
duplication, increased RNA transcription or processing, increased RNA persistence, any combination 
thereof, or by any other mechanism, in a proportion of breast cancer cells. Preferably, the RNA is 
overabundant in at least about 20% of a representative panel of breast cancer cell lines, such as the 
20 panels listed herein; more preferably. It is overabundant In at least about 40% of the panel; even more 
preferably, it is overabundant In at least 60% or more of the pane}. Preferably, the RNA is 
overabundant In at least about 5% of spontaneously occurring breast cancer tumors; more preferably, 
it is overabundant in at least about 10% of sudi tumors; more prefrnkriy, it is overabundant in at least 
about 20% of such tumors; more pieferaMy. it is overabundant in at teast about 30% of such tunoors; 
25 even more preferably, It Is overabundant In at least about 50% of such tumors. 

Preferably, a sequence of at least 10 nucteotides Is essentially identical between the isolated 
polynucleotide of the invention and a cDNAfrom CH1-9a11-2. CH8-2a13-1. CH13-2a12-1, and CH14- 
2a16-1; more preferably, a sequence of at least about 15 nucleotides is essentially identical; more 
preferably, a sequence of at least about 20 nudeotides © essentially identical; more preferably, a 
30 sequence of at least abovH 30 nucleotides is essentially identical; more preferably, a sequence of at 
least about 40 nudeotides is essentially identical; even more preferably, a sequence of at least about 
70 nudeotides is essentially Identical; still more preferably, a sequence of about 100 nudeotides or 
more is essentially Identical. A frirther embodiment of tiiis invention is an isolated polynudeotide 
comprising a linear sequence essentially Identical to a sequence selected from ttie group consisting of 
35 SEQ. ID NO:16, SEQ. ID NftlS, SEQ. ID NO:21, SBCl ID NO;23, SEQ, ID NO:26. SEQ. ID NO:29. 
SEQ. ID NO:31,. SEQ. ID NO:33, and SEQ. ID NO:35. These embodiments indude an isolated 
polynudeotide which is a DNA polynudeotide, an RNA polynucleotide, a polynucleotide probe, or a 
polynucleotide primer. 
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This invention also provides an teolated polypeptide comprising a sequence of amino acids 
essentialty identical to the polypeptide encoded by or translated from a polynucleotide selected from 
the group consisting of CH1.9al1-2, CH8.2a13{-1. CH13-2a12-1. and CH14-2a16-1. Preferably, a 
sequence of at least about 5 amino acids is essentially identtcai between the polypeptide of this 
5 invention and that encoded by the polynucleotide; mote preferably, a sequence of at least about 10 
amino adds is essentially identical; more preferably, a sequence of at least 15 amino acids is 
essentially identical; even nrK>re preferably, a sequence of at least 20 aimio adds is essentially 
Identical; still more preferably, a sequence of atx>ut 30 amino adds or more is essentially identical. 
Preferably, the polypeptide comprises a linear sequence of at least 15 amino adds essentially 

10 identical to a sequence encoded by said polynudeotide. Another embodiment of this invention Is a 
polypeptide comprising a linear sequence essentially identical to a sequence selected from the group 
consisting of SEQ. ID Nai7. SEQ. ID NO:20, SEQ. ID NO:25, SEQ. ID NO:28. SEQ. ID NO:30. 
SEQ. ID NO:32, SEQ. ID NO:34; and SEQ. ID NO:37. 

A further embodiment of this invention is an antibody specific for a polypeptide embodied in 

15 tiiis invention. This encompasses both monodonai and isolated polyclonal antibodies. 

A further embodiment of this invention is a method of using the potynudeotides of this 
invention for detecting or measuring gene duplication in cancen^us cells, especially but not limited to 
breast cancer cells, comprising the steps of reacting DMA contained in a dintcal sample with a 
reagent comprising the polynudeotide. said clinical sample having been obtained from an individual 

20 suspected of having cancerous cells; and comparing the amount of complexes fonned between ttie 
reagent and ttie DMA In the clinical sample with the amount of complems fonned between Vhe 
reagent and DMA in a cor^ sample. 

A furflier emtxxliment is a meOiod of using the polynucleotides of this invention for detecting 
or measuring overabundance of RNA In cancerous cells, especially but not limited to breast cancer 

25 ceils, comprising tiie steps of reacting RNA contained in a dinical sample with a reagent comprising 
the polynucleotide, said clinical sample having been obtained from an individual suspected of having 
cancerous cells; and comparing tiie amount of complexes formed between the reagent and the RNA 
in the dinical sample with the anx)unt of complexes formed between the reagent and RNA in a control 
sampte. 

30 Another embodiment of this invention is a diagnostic Icit for detecting or measuring gene 

duplication or RNA overabundance in cells conteined in an individual as manifest in a dinical sample, 
comprising a reagent and a buffer in suitable packaging, wherein the reagent comprises a 
polynucleoHde of tiiis invention. 

Anotiier embodiment of this invention te a mettiod of using a polypeptide of this Invention for 

35 detecting or measuring specific antibodies in a dinical sample, comprising the steps of reacting 
antitxxlies conteined in the dinical sample vinth a reagent comprising the pdypeptide, said clinical 
sample having been obtained from an individual suspected of having cancerous cette, espedally but 
not limited to breast cancer cells; and comparing the amount of complexes formed between the 
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reagent and the antbodies in the clinical sample with the amount of complexes fonned tietween the 
reagent and antit)odies In a control sample. 

Another emtxxjinnent of this invention is a niethod of using an antitx>dy of this invention for 
detecting or measuring altered protein expres;&ion tn a clinical sample, ccHnprisIng the steps of 

5 reacting a polypeptide contained in the clinical sample with a reagent comprising the antibody, said 
clinical sample having been obtained from an individual suspected of having cancerous cells, 
especially but not limited to breast cancer ceils; and comparing the amount of complexes fonned 
between the reagent and the polypeptide in the clinical sample with the amount of complexes formed 
between the reagent and a polypeptide in a control sample. Further embodiments of this invention 

10 are diagnostic kits for detecting or measuring a polypeptide or antibody present in a clinical sample^ 
comprising a reagent and a buffer in suitable packaging, wherein the reagent respectively connprises 
either an antibody or a polypeptide of this lnventk>n. 

Yet another embodiment of this inventkm is a host cell transfeded by a polynucleotide of this 
inventk)n. A further embodiment of this invention is a method for using a polynucleotide for screening 

15 a pharmaceutical candidate, comprising the steps of separating progeny of the transfected host cell 
into a first group and a second group; treating the first group of cells with the phannaceuttcal . 
candidate; not treating the second group of cells with the pharmaceutical candkiate; and comparing 
the phenotype of the treated cells with that of the untreated cells. 

This invention also embodies a pharmaceutk:al preparation for use in cancer therapy, 

20 comprising a polynucleotide or polypeptide emtxxiied by this inventkKi, saki preparation being 
capable of redudng the patiiotogy of cancerous ceils, especially for but not limited to breast cancer 
cells. Further embodiments of this invention are methods for treating an ihdivkiual bearing cancerous 
cells, such as breast cancer ^, cells, comprteing administering any of the aforementioned ... 
pharmaceutical preparations. 

25 Stai anottier embodiment of this invention is a pharmaceutical preparation or active vaccine 

comprising a polypeptide embodied by this invention In an immunogenb form and a pharmaceutically 
compatible exctptent A further embodiment is a method for treatment of cancer, especially but not 
limited to breast cancer, either prophylactically or after cancerous cells are present tn an individual 
being treated, comprising administration of the aforementioned pharmaceutical preparation. 

30 Another series of embodiments of this invention relate to nrtethods for obtaining cDNA 

conresponding to a gene associated with cancer, comprising the steps of. a) supplying an RNA 
preparation from uncultured oontrel cells; b) supplying RNA preparations from at least two different 
cancer cells; c) displaying cDNA correspondmg to the RNA preparations of step a) and step b) 
such that different cDNA corresponding to different RNA in each preparation are displayed 

35 separately; d) selecting cDNA conresponding to RNA that is present in greater abundance in the 
cancer cells of step b) relative to the control cells of step a); e) supplying a digested ONA 
preparation from contit>l cells; f) supplying digested DNA preparations firom at least two different 
cancer cells; g) hybridizing ttie cDNA of step d) with the digested DNA preparations of step e) and 
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step f); and h) further selecting cDNA from the cDNA of step d) conesponding lo genes that are 
dupficated in the cancer cells of step f) relative to the control cells of step e). 

One or more enhancements nnay optionally be included in the methods of this invention, 
including the following: 

S 1 . Cancer cells are preferably used for step b) that share a duplicated gene in the same 

region of a chromosome. If desired, the practitioner may test cancer cells beforehand 
to detect the duplication or deletion of chrcHfnosome regions; or cancer cell tines may 
be used that have already been characterized In this respect. 

2. A higher plurality of cancer cells are preferably used to provide DNA for step b), step f). 
10 or preferably both step b) and step f). The use of three cancer cells is preferred over 

two; the use of four cancer cells is more prefemed* about five cancer cells is still more 
preferred, about eight cancer cells is even more prefenred. The cDNA of each cancer 
cell population is displayed or hybridteed separately, in accordance with the method. 

3. A higher plurality of control cells are preferably used to provide OlMA for step a), step 
1 5 e), or preferably both step a) and step e). The use of two control cell populations is 

preferred; the use of three or more is even nK>re preferred. Both proliferating and non- 
proliferating populations are preferably used, if available. 
A. The control cells are preferably supplied fresh from a tissue source, and are not 

cultured or transfonmed into a cell line. This Is increasingly important when the control 
20 cell populations used in step a) is only one or two in number. Freshly obtained cancer 

cells may also be used as an alternative to cancer cell lines, although this Is less 
critical. 

5. An additional screening stop is preferably conducted in which the cDNA corresponding 
to the putotive cancer-associated gene is additionally hybridized with a digested 

25 mitochondrial DNA preparation, to eliminate mitochondrial genes. This screening step 

may ft>e conducted before, t>etween, subsequent to, or simuttaneousiy with the other 
screening steps of the method. 

6. An additional screening step is preferably conducted in which RNA is supplied from a 
plurality of cancer cells, and one or preferably more control cell populations; the RNA is 

30 contacted with cDNA corresponding to the putative cancer-associated gene under 

conditions that permit fomnation of a stable duplex, and cDNA Is selected 
conresponding to RNA that is present in greater abundance in a proportion of the 
cancer cells relative to the control cells. Preferably, the plurality of cancer cells is a 
panel of at feast five, preferably at least ten cells. Preferably at least three, more 

35 preferably at least five of the cancer cells show greater abundance of RNA. Preferably 

at feast one and preferably more of the cancer cells shows a greater abundance of 
RNA compared with control celts, but does not show duplication of the connesponding 
gene in step h) of the method. 
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Other embodiments of the invention are methods for obtaining cDNA corresponding to a 
gene that is deleted or underexpressed in cancer, comprising the steps of: a) supplying an RNA 
preparation from control cells; b) supplying RNA preparations from at least two different cancer 
ceils that share a deleted gene in the same region of a chromosome, c) displaying cDNA 

5 corresponding to the RNA preparations of step a) and step b) such that different cDNA 
corresponding to different RNA in each preparation are displayed separately; and d) selecting 
cDNA corresponding to RNA that is present in lower abundance in the cancer cells of step b) 
relative to the control cells of step a). Such methods typically comprise the following further steps: 
e) supplying a digested DNA preparation from control cells; f) supplying digested DNA 

10 preparations from at least two different cancer cells; g) hybridizing the cDNA of step d) with the 
digested DNA preparations of step e) and step f); and h) further selecting cDNA fironi the cDNA of 
step d) corresponding to a gene that is deleted in the cancer celts of step f) relative to the control 
cells of step e). Such ntethods for identifying deleted or underexpressed genes may also comprise 
enhancemente such as those described above. 

15 Additional embodiments of this invention are methods for characterizing cancer genes, 

comprising obtaining cDNA corresponding to a cancer*associated gene according to a method of 
this invention^ particularly those highlighted above, and then sequencing the cDNA. Alternatively or 
in addition, the cDNA may be used to rescue additional polynucleotides conresponding to a cancer- 
associated gene from an mRNA preparation, or a cDNA or genomic DNA library. 

20 Additional emt)odiments of this invention are methods for screening candidate drugs for 

cancer treatment comprising obtaining cDNA corresponding to a gene that is duplicated, 
overexpressed. deleted, or underexpressed in cancer, and comparing the effect of the candidate 
drug on a cell genetically altered with the cDNA or fragment thereof with the effect on a cell:not 
genetically altered. 

25 Various embodiments of this invention may be empfoyed in pursuit of any form of cancer 

for which suiteble tissue sources are available. Cancers of particular interest include lung cancer, 
glioblastoma, pancreatic cancer, colon cancer, prostete cancer, hepatoma, myeloma, and breast 
cancer. 

30 Brief description pf the prawinqs 

Flgurm f is a half-tone reproduction of an autoradiogram of a differential display experiment, in which 
radiolabeled cDNA corresponding to a subset of totel messenger RNA in different oelfs are compared. 
This is used to select cDNA corresponding to particular RNA that are overabundant in breast cancer 

35 

Figure 2 is a half-tone reproduction of an autoradiogram of eiectrophorssed DNA digests from a 
panel of breast cancer ceB tines probed with a CHB-2a1 3-1 insert (Panel A) or a loading control (Panel 
B). 
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Figure 3 is a half-tone reproduction of an autoradiogram of electrophoresed total RNA from a panel of 
breast cancer cell lines prot)ed with a CH8-2a1 3-1 insert (Panel A) or a loading control (Panel B). 

5 Figure 4 is a half-tone reproducdon of an autoradiogram of electrophoresed DNA digests from a 
panel of breast cancer cell lines probed with a CH13-2a12-1 insert 

Figure 5 is a half-tone reproduction of an autoradiogram of electrophoresed total RNA from a panel of 
breast cancer cell lines probed with a CH13-2a12-1 Insert. 

10 

Hgute 6 is a map of cDNA fragments obtained for the breast cancer associated genes CH1-9a1 1-2, 
CH8-2a13-1. CH13-2a12-1 and CH14-2a16-1. Regions of the fragments used to deduce sequence 
data listed in the applkation are indicated by shading. Nucleotide positions are numbered from the 
left-most residue for which (k>ut>le-strand sequence data has been obtained, which is not necessarily 
15 the 5' terminus of the corresponding message. 

Figure 7 is a listing of primers used for obtaining the cDNA sequence data for CH1 -9a1 1-2. 

Figure 8 is a listing of cDNA sequence obtained for CH1-9a1 1-2. 

20 

Figure 9 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
ttie DNA sequence of Cl-l1-9a11*2 shown In Figure 6. The single4etter amino add code Is used. 
Stop codons are indicated tiy a dot (•). The upper panel shews the complete amino add translation; 
the lower panel shows the predicted gene product protein sequence. A possible transmembrane 
25 region is Indicated by underlining. 

Figure f 0 is a Nsting of (Himers used for obteining the cDNA sequence date for CH8-2a13-1 . 

Figure ft is a listing of cDNA sequence obteined for CH8-2a13-1 . 

30 

Figure 12 is a listing of the amino acid sequence corresponding to the longest open reading frame of 
the DNA sequence of CH8-2a13-1 shown in Figure 11. The upper panel shows the complete amino 
add translation; the lower panel shows the predicted gene product protein sequence. 

35 Figure 13 is a listing of the nucleotide sequence predicted for a fulMength CH8-2a1 3-1 cDNA. 

Rgure 14 Is a listing of the amino add sequence corresponding to the longest open reading frame of 
the DNA sequence of CH8-2a13-1 shown In Figure 13. 
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Hgure IS is a listing of primers used tor obtaining the cDNA sequence data for CH13-2a12-1 . 
Ffgum 16 is a listing of cDNA sequence obtained for CHI 3-2a12-1 . 

5 

FigurB 17 is a listing of the amino acid sequence con^esponding to the longest open reading frame of 
the DNA sequence of CH13-2a12-1 shown In Figure 16. The upper panel shows the complete amino 
add translation; the lower panel shows the predicted gene product protein sequence. 

10 Figure fa is a listing of primers used for obtaining cDNA sequence data for CH13*2a12-1.. 

F^ure f d is a listing of the cDNA sequence data obtained by two-directional sequencing for CH14- 
2a16-1. 

15 FiguiB 20 is a listing of the amino add sequence corresponding to the longest open reading frame of 
the DNA sequence of CH14-2a16-1 shown in Figure 19. The upper panel shows the complete amino 
add translation; the lower panel shows the predicted gene product protein sequence. Residues 
corresponding to three zinc finger nfK>tjfs are underlined. Indicating that the protein may have DNA or 
RNA binding activity. 

20 

FigutB 21 is a listing of additional DNA sequence data towards the 5' end of CH14-2a16-1 obtained 
by one^insctional sequendng of the firagnr^ent pCH14-1.3. First two panels show nudeotide and 
amino add sequence from the 5' end of the firagment; the second two panels show nucleotide and 
amino acid sequence from the 3' end of the fragment. Regions of overlap with pCH14-800 are 
25 underlined. 

Figure 22 is a listing of the nucleotkte sequences of initial fragments obtained corresponding to the 
tour breast cancer associated genes, along with their amino add translations. 

30 Figure 23 Is a listing of additional cDNA sequence obtained for CH1*9a11-2, comprising 
approximately 1934 base pairs 5' from the sequence of Figure 8. 

Figure 24 Is a listing of the amino add sequence corresponding to the tongest open reading frame of 
the DNA sequence of CH1-9a11-2 shown in Figure 23. The single-letter amino add code is used. 
35 Stop codons are Indicated by a dot (•). 

Rgure 25 is a listing of additional cDNA sequence obtained for CH14-2a16-1, comprising 
approximately 1934 base pairs 5' from the sequence of Figure 19. 
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Figure 26 is a listing of the amino add sequence corresponding to the tongest open reading frame of 
the DNA sequence of CH1.9a1 1-2 shown In Figure 25. The single-letter amino add code is used. 
Stop codons are indicated by a dot (•). The upper panel shovvs the complete amino acid translation; 
5 the lower panel shows zthe predicted gene product protein sequence. 

BEST MODE FOR CARRYING OUT THE iMVgNTlQM^ 

This invention relates to the discovery and characterization of four novel genes associated 
10 with breast cancer. The cDNA of these genes, and their sequences as disdosed below, pravide the 
basis of a series of reagents that can be used in diagnosis and therapy. 

Using a panel of about 15 cancer cell lines, each of the four genes was found to be duplicated 
in 4(W0% of the cells tested. Surprisingly, each of the four genes was duplicated in at least one cell 
line where studies using comparative genomic hybridization had not revealed any amplification of the 
1 5 corresponding chromosomal region. 

Levels of expression at the mRNA level were tested in a similar panel for two of these four 
genes. In addition to those cell lines showing gene duplication. 17 to 37% of the Knes showed RNA 
overabundance without gene duplication, indicating that the malignant cells had used some 
mechanism other than gene duplication to pronx)te the abundance of RNA conesponding to these 
20 genes. Ail four of the breast cancer genes have open reading frames, and likely are transcribed at 
various levels in different cell types. Overabundance of the corresponding RNA in a cancerous cell is 
likely assodated with overexpresston of the protein gene product Such overexpression may be 
nnanifiBst as Increased secretion of the protein from the cell into Wood or the surrounding environment, 
an IrK^reased density of the protein at the cell surface, or an nicreased accumulatton the protein within 
25 the cell, in comparison to the typical level in noncancerous cells of the same tissue type. 

Different tumors bear diffierent genotypes and phenotypes, even when derived ftom the same 
tissue. Gene therapy in cancer is more likely to be effective if it Is aimed at genes that are frivolved in 
supporting the malignancy of the cancer This Inventton discloses genes that achieve RNA 
overabundance by several mechanisms, because they are more likely to be directly involved in the 
30 pathogenic process, and therefore suitable targets for pharmacological manipulation. 

Features of the four novel genes, the respective mRNA, and the cDNA used to find them are 
provkled in Table 1. 
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TABtfil: 


. Characteilstia of 4 NovetBr^ 


Chromosome 


Desigitation 


mRMA 
Observed' 


Exemplary cDNA 
FnignienisCloiied 


i 




CH1-9a11-2 


5.5kb, 4.5kb 


1.1 kb, 2.5 kb 


8 




CH8-2a13-1 


4.2kb 


0,6 kb (two), 3.0 kb. 
4.0 kb 


13 




CH13-2a12-1 


3.5kb. 3.2kb 


1.6kb,3.5kb i 


14 




CH14-2a16-1 


3.8kb. 3kb 


O.Skb, 1.3kb.1.6kb.2.5 
kb 



All four genes sequences are unrelated to other genes known to be overexpressed in breast 
cancer, including the ertB2 gene (Adnane et al.)» tissue factor (Chen et al.), mammaglobulin (Watson 
et al.), and DD96 (Kocher et al.). 
5 The four mRMA sequences each comprise an open reading frame. The CH1-9a1 1-2 gene is 

expressed at the mRMA level at relatively elevated levels In pancreas and testis. The CH6-2a13-1 
gene is expressed a! relatively elevated levels in adult heart, spleen, thymus, small intestine, colon, 
and tissues of the reproductive system; and at higher levels in c^tain tissues of the fetus. The CH13- 
2a12-1 gene is expressed at relatively elevated leves in heart skeletel rmisde, and testte. The CH14- 

10 2a16-1 gene Is expressed at relatively elevated levels in testis. The level of expression of all four 
genes is especially high in a substantial proportk>n of breast cancer cell fines. 

The CH1-9a11-2 gene encodes a protein with a putetive transmembrane region, and may be 
expressed as a surface protein on cancer cells. The CH13-2a12-1 gene is distentiy related to a C. 
etegans gene implicated in cell cycle regulation, and may play a role in the regulatton of cell 

15 proliferation. The protein encoded by CH13-2a12-1 is distantly related to a vasopressin-activated 
calcium binding receptor, and may have Ca^ binding activity. The CH14-2a16-1 comprises at least 
five domains of a zinc finger binding motif and is distantly related to a yeast RNA binding protein. The 
CH14-2a16-1 gene product is suspected of having DNA or RNA binding activity, which may relate to a 
role in cancer pathogenesis. 

20 The four genes described here are exemplars of genes that undergo altered expression in 

cancer, kJentifiable using the gene screening methods of the inventk>n. The method involves an 
analysis for both DNA dupllcatton and altered RNA abundance relating to the same gene. Since 
abnormal gene regulatk>n is central to the malignant process, the identification method may be 
brought to bear on any type of cancer. 

25 The screening method is superior to any previously available approach In several respects. 

Particulariy significant is that screening is repidiy focused towards genes that are central to the 
malignant process, and away from those that have variable levels of expression as part of normal 
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metabolic processes. Furthennore. because the end-product is a cDNA corresponding to the 
gene, the process leads rapidly to detailed characterization of the gene, and any effector molecule 
it may encode. This in turn leads to development of new diagnostic and therapeutic materials and 
techniques. 

5 

DeHnltlons 



10 



Terms used in this application include the following; 

The temi "polynucleotide" refers to a polymeric form of nucleotides of any length, either 
deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any 
three-dimensional stmcture. and may perfbmri any function, known or unknovwi. The following are 
non-limiting examples of polynucleotides: a gene or gene fragment, exons. Introns. messenger RNA 
(mRNA). transfer RNA. ribosomal RNA, ribozymes, cDNA. recombinant polynucleotides, branched 
polynucleotides, plasmids. vectors, isolated DNA of any sequence, isolated RNA of any sequence, 
15 nucleic acid probes, and primers. A poiynudeolide may comprise modified nucleotides, such as 
methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure 
may be imparted before or after assenribly of the polymer. The sequence of nucleotides may be 
interrupted by non-nudeotide components. A polynucleotide may be further modified after 
polymerization, such as by conjugation with a labeling component 
20 The tern polynucleotide, as used herein, refers interchangeably to double- and 

sirigte-stranded molecules. Untess othenvise specified or required, any embodiment of the invention 
described herein that is a polynucleotide encompasses both the double-stranded form, and each of 
two complementary single-stranded fonris known or predicted to make up the double-stra^^ 

In the context of polynucleotides, a -Inear sequence* or a "sequence" an order of 
25 nudeotides in a polynucleotMe in a 6' to 3" diredkxi in whnh reskiues that neighbor each other in the 
sequence are contiguous in the primary stricture of the polynucleotide. A "partial sequence' is a 
linear sequence of part of a polynudeotkle whk:h is known to comprise addittonal residues in one or 
t>oth directtons. 

"Hybridizatfon* refers to a reaction in which one or more polynucleotides react to form a 
30 complex that is stabilized via hydrogen bonding between the bases of the nucleotide residues. The 
hydrogen bonding s sequence-spedfte. and ^picai:/ occurs by Watson-Crick base pairing. A 
hybridizatfon reactfon may constitute a step in a more extensive process, such as the initiation of a 
PCR, or the enzymatic cleavage of a polynucleotkte by a ribozyme. 

HybrWIzatton reactions can be perfomied under condifions of diffBrent "stringency". Relevant 
35 eorxlittons include temperature, tonfc strength, time of Incubatkm. the presence of addi^ 

the reactton mixture such as formamWe, and the washing procedure. Higher stringency conditfons 
are those conditions, such as higher temperature and tower sodium ton concentratton, which require 
higher minimum complementerity between hybridizing elements for a stable hybridization complex to 
form. Conditfons that increase the stringency of a hybridizatfon reactton are wkiely known and 
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published in the art see. for example. "Molecular Ctontng: A Laispratory Manuar. Second Edition 
(Sambrook. Fritsch & IManiatis, 1989). 

When hybridization occuns in an antiparailel configuration betiveen two single-stranded 
polynucleotides, those polynucleotides are described as "cornplementary''. A double-stranded 
5 polynucleotide can be "complemenlary" to another polynucleotide, if hybridization can occur between 
one of the strands of the first polynucleotide and the second. Complementarity (the degree that one 
polynucleotide is complementary with another) is quantifiable in terms of the proportion of bases in 
opposing strands that are expected to form hydrogen bonding with each other, according to generally 
accepted base-pairing rules. 

10 A linear sequence of nucleotides is "identicar to another linear sequence, if the order of 

nucleotides in each sequence is the same, and occurs without suk>stitution, deletion, or material 
substitution. It is understood that purine and pyrimidine nitrogenous t>ases with similar structures can 
be functionally equivalent in terms of W^tson-Crick base-pairing; and the inter-substitutkin of like 
nitrogenous bases, particularty uradi and thymine, or the modificatton of nitrogenous bases, such as 

15 by methylatton. does not constitute a material substitutk>n. An RNA and a DNA polynudeotkle have 
ktentkal sequences when the sequence for the RNA reflects the order of nitrogenous bases in the 
polyribonucleotkies. the sequence for the DNA reflects the order of nitrogenous fc>ases in the 
polydeoxyribonucleotides. and the two sequences satisfy the other requirements of this definition. 
Where one or both of the polynucleotides being compared is double-stranded, the sequences are 

20 identical if one strand of the first pdynucieotkie is Identteal with one strand of the second 
polynudeotkle. 

A linear sequence of nudeotktes is "essentially Identical'* to another Bnear sequence, if both 
sequences are capable of hybridizing to form a duplex with the same comptementary polynucleotide. 
Sequences that hybridize under oondittons of greater stringency are more preferred. K is understood 

25 that hybrklizatton reactkms can accommodate insertkxis. deletions, and substituttons in the nudeotkle 
sequence. Thus. Hnear sequences of nucleotkles can be essentially Mentteal even if some of the 
nucleotide residues do not precisely correspond or align. In general, essentially identical sequences 
of about 40 nucleotkJes in length will hybridize at about 300C in 10 x SSC (0.15 M NaCI, 15 mM 
dtrate buffer); preferably, they will hybridize at about 400C in 6 x SSC; more preferably, they will 

30 hybridize at about 500C in 6 x SSC; even more preferably, they will hybridize at about 600C in 6 x 
SSC, or at about 400C in 0.5 x SSC, or at about 300C In 6 x SSC containing 50% fbrmamide; still 
more preferably, they will hybridize at 400C or higher in 2 x SSC or bwer in the presence of 50% or 
more formamkie. It is understood that the rigor of the test is partly a functton of the length of the 
polynucleotide; hence shorter polynucleotkies with the same homotogy shouM be tested under lower 

35 stringency and tonger polynudeotMes shouU be tested under h'gher stringency, adjusting ttie 
condittons accordingly. The reiattonship belween hybridization stringency, degree of sequence 
klentity, and polynucleotide length is known in the art and can be calculated by standard fbnnulae; 
see, e.g., Meinkoth et al. Sequences that conespond or align more doseiy to the inventton disdosed 
herein are comparably more prefenred. Generally, essentially Identical sequences are at least about 
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50% identical with each other, after alignment of the homologous regions. Preferably, the sequences 
are at least about 60% identical: more preferably, they are at feast about 70% identical; more 
preferably, they are at feast about 80% identical; more preferably, the sequences are at feast about 
90% identical: even more preferably, they are at feast 95% identical; still more pr«ferably. the 
5 sequences are 100% identical. Percent identity is calculated as the percent of residues in the 
sequence being compared that are identical to those in the reference sequence, which is usually one 
of those listed or described in this application, unfess stated othenwise. No penalty is imposed for 
introduction of gaps in the reference or comparison sequence for purposes of alignment, but the 
resulting fragments must be rationally derived - small gaps may not be introduced to triviaUy improve 
10 the identify score. 

In determining whether polynucfeotide sequences are essentially identical, a sequence that 
preserves the functionality of the polynucfeotide with which It is being compared is particularly 
preferred. Functionality may be established by different criteria, such as ability to hybridize with a 
target polynucteotide. and whether the poiynudeolide encodes an identical or essentfelly identical 

15 polypeptides. Thus, nucfeotlde substitutions which cause a non^onsen^ative substitution in the 
encoded polypeptide are preferred over nucteotide substitutions that create a stop codon; nucleotide 
substitutions that cause a conservative substitution in the encoded polypeptide are more preferred, 
and identical nucleotide sequences are even more preferred. Insertions or deleHons In the 
polynucleotide that result in insertions or deletions in the polypeptide are preferred over those that 

20 result in the down-stream coding region being rendered out of phase. The relative importance of 
hybridization properties and the polypeptide encoded by a polynucfeotide depends on the application 
oftheinvenlion. 

A "reagenr polynucteolide, polypeptide, or antibody, is a substance provided for a reaction, 
the substance having 8on» known and desirabfe parameters for the reactton. A reaciton mixture maji 

25 also contain a 'targef. such as a polynucteotide. antibody, or polypeptide that the reagent is capabte 
of reacting with. For exampfe. in some types of diagnostic testa, the amount of the target in a sampte 
is detemnned by adding a reagent, allowing the reagent and target to i«act, and measuring the 
amount <5f reaction product. In the context of clinical management a "tergef may also be a ceR, 
colfection of cells, tissue, or organ that is the object of an administered substance, such as a 

30 pharmaceutical contpound. 

"cDNA" or "complementary DMA" is a singfe- or doubfe-stranded DMA polynucteotide in which 
one strand fe comptementary to a messenger RNA. TulHength cDNA" is cDNA comprised of a strand 
which is compfementaiy to an entire messenger RNA mofecute. A 'cDNA fragmenf as used herein 
generally represente a sub^eglon of the full-fength form, but the entire IWHength CDNA ^ 

35 included. Unfess explicitly spedfted. the tenn cDNA encompasses both the lulHength fbm 
fiagmentform. 

Different polynucfeotides are said to "conespond" to each other If one is ultimately derived 
from another. For example, messenger RNA corresponds to the gene from which It is transcribed. 
cDNA corresponds to the RNA from which it has been produced, such as by a reverse transcription 
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reaction, or by chemical synthesis of a DMA t>ased upon knowledge of the RNA sequence. cDNA 
also corresponds to the gene that encodes the RNA. Polynucleotides may be said to correspond 
even when one of the pair is derived from only a portion of the otfrnr. 

A "probe" when used in the context of polynucleotide manipulation refers to a polynucleotide 

5 which is provided as a reagent to detect a target potentially present in a sample of interest by 
hybridizing with the target. Usually, a probe will comprise a label or a means by which a label can be 
attached, either hBfore or subsequent to the hybridization reaction. Suitable labels include, but are not 
limited to radioisotopes, fluorochnomes, chemiluminescent compounds, dyes, and enzymes. 

A "primer" is a short polynucleotide, generally with a free 3' -OH group, that t>inds to a target 

10 potentially present in a sample of interest by hybridizing with the target, and thereafter promoting 
polymerization of a polynucleotide complementary to the target A "polymerase chain reaction" 
(*PCR") is a reaction in which replicate copies are made of a target polynucleotide using one or more 
primers, and a catalyst of polymerizatton, such as a reverse transcriptase or a DMA polymerase, and 
particularty a tiiemnally stable polymerase enzyme. Methods for PGR are taught In U.S. Patent Nos. 

15 4,683,195 (Mutlis) and 4,683,202 (Mullis et al.). All processes of producing replicate copies of the 
same polynucleotide, such as PGR or gerte cloning, are collectively referred to herein as "replication." 

An "operon" is a genetic region comprising a gene encoding a protein and functionally related 
5* and 3' flanking regions. Elements within an operon include but are not limited to pronfK>ter regions, 
enhancer regions, repressor binding regbns, tanscription initiation sites, ribosome binding sites. 

20 translation initiation sites, protein ertcoding regions, introns and exons, and termtnation sites for 
transcription and translation. A "promoter* Is a DMA region capable under certain conditions of 
binding RNA polymerase and initiating transcription of a coding region located downstream (in the 3' 
direction) from the promoter. "Operably linked" refers to a juxteposition of genetic elements, wherein 
the elemente are in a relationship pemrutting tiiem to operate In the expected manner. For instence, a 

25 pronxjter is operebly linked to a coding regton if the promoter helps initiate transcription of the coding 
sequence. There may t>e intervening reskjues between the promoter arui coding regfon so fong as 
this functional relationship is mainteined. 

"Gene duplication" is a tenm used herein to describe the process whereby an increased 
number of copies of a particular gene or a fragment thereof rs present in a particular cell or cell line. 

30 "Gene amplifrcation" generally is synonymous with gene duplication. 

"Expressbn" is defined alternately in the scientific literature either as the transcription of a 
gene into an RNA pdynudeotkie, or as the taBnscription and subisequent ti^nslation into a 
polypeptide. As used herein, "expressfon" or "gene expresskm" generally refers to the production of 
the RNA unless specified or required otherwise. Thus, "RNA overexpresston" reflecte the presence of 

35 more RNA (as a proportion of total RNA) from a particular gene in a cell being described, such as a 
cancerous cell, in relation to that of the cell it is being compared wtth, such as a non-cancerous cell. 
The protein product of the gene may or may not be produced in nomnal or abnomnal amounts. 
''Protein overexpresston" similariy reflecte the presence of relative^ more protein present in or 
produced by, for example, a cancerous cell. 
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"Abundance" of RNA refers to the amount of a particular RNA present in a particuiar ceil type. 
Thus. "RNA overabundance- or "overabundance of RNA" describes RNA that is present in greater 
proportion of total RNA in the cell type being described, compared with the same RNA as a proportion 
of the total RNA in a control eel A number of mechanisms may contribute to RNA overabundance in 
a particuiar cell type: for example, gene duplication, increased level of transcription of the gene, 
increased persistence of the RNA within the oeO after it is produced, or any combination of these. 
Similarly, "lower abundance" or "underabundance" describes RNA that is present in lower 
proportion in the cell being described compared vinth a control cell. 

The temis "polypeptide", "peptide" and "protein" are used interchangeably herein to refer to 
polymers of amino acids of any length. The polymer may be linear or branched, it may comprise 
modified amino adds, and it may be interrupted by non-amino acids. The temis also encompass an 
amino add polymer that has been modified: for example, disulfide bond fbnnation. glycosylation. 
lipidation. acetylation. phosphorylation, or any other manipulation, sudi as conjugation with a labeling 
component 

In the context of polypeptides, a "linear sequence" or a "sequence" is an order of amino adds 
In a polypeptide in an N-tenninal to Oterminal direction in which residues that neighbor each other in 
the sequence are contiguous in the primary stmcture of the polypeptide. A "partial sequence" is a 
linear sequence of part of a polypeptide which is known to comprise additional residues in one or both 
directions. 

A tinear sequence of amino acids is "essentially identical" to another sequence if the two 
sequences have a substantial degree of sequence identity. It is understood ttiat ttie functional 
proteins can accommodate insertions, deletions, and substitutions in ttie amino acid sequence. Thus, 
mear sequences of amino adds can be essentially identical even if some of ttie residues do not 
precisely correspond or aHgn. Sequences that conespond or align mm closely to the inv«ition 
disclosed herein are more preferred, ft is also understood that some amino add substitutions are 
more easily tolerated. For example. substitaiHon of an amino add wHh hydrophobic side chains, 
aromatic side chains, polar side chains, side chains witti a positive or negative charge, or side chains 
comprising two or fewer carbon atoms, by another amino add with a side chain of like properties can 
occur wittiout disturi>ing the essential Wentity of ttie two sequences. Mettiods for detemiining 
30 homologous regions and scoring ttie degree of homotogy are well known In ttie art; see for example 
Altschul et al. and Henitofret al. WelMolerated sequence differences are refened to as "consen/ative 
substitutions". Thus, sequences witti conservative substitutions are prefened over ttiose witti ottier 
substitutions in ttie same positions: sequences witti identical residues at ttie same positions are still 
more preferred. In general, amino add sequences ttiat are essentialy identical are at Irast about 
35 15% identical, and comprise at feast about anottier 15% which are eittier identical or are oonsen^ve 
substitutions, after alignment of homologous regions. More prafeiabiy. essentially identical 
sequences comprise at least about 50% identical residues or conservative substitutions: more 
preferably, ttiey comprise at feast about 70% kientical resklues or conservative substitutions; more 
preferably, ttiey comprise at least atx>ut 80% Menticai resklues or consorvative substitutions; more 



20 



25 



-18- 



wo 97/38085 



PCT/US97/05930 



prefierably. they comprise at least about 90% identical residues or conservative substitutions; more 
preferably, they comprise at least about 95% identical residues or conservative substitutions; even 
more preferably, they contain 100% identical residues. 

In detemnining whether polypeptide sequences are essentially identical, a sequence that 
5 preserves the functionatity of the polypeptide with which it is being compared is particularly preferred. 
Functionality may be established by different parameters, such as enzymatic activity, the binding rate 
or affinity in a receptor-figand interaction, the binding affinity with an antibody, and X-ray 
crystallographic structure. 

An ^'anfabody" (interchangeably used in plural form) is an immunoglobulin molecule capable of 

10 specific binding to a target, such as a polypeptide, through at least one antigen recognition site, 
kxx|ted in the variable region of the inmjnoglobutin molecule. As used herein, the term 
erv»>mpasses not only intact antibodies, but also fragments thereof, mutants thereof, fusion proteins, 
humanized antibodies, and any other modified configuration of the immunoglobuiin molecule that 
comprises ah antigen recognition site of the required specificity. 

15 The term "antigen" refers to the target nfx>lecule that is specifically bound by an antibody 

throiQh its antigen recognition site. The antigen may, but need not be chemically related to the 
immurxjgen that stimulated production of the antitxxiy . The ant^n may be polyvalent, or it may be a 
monovalent hapten. Examples of kinds of antigens that can be recognized by antitxxiies include 
polypeptides, polynucleotides, other antibody molecules, oligosaccharides, complex lipids, drugs, and 

20 chemicals. An "immunogen*; is an antigen capable of stimulating production of an antibody when 
injected into a suitable host, usually a mammal. Compounds may be rendered immunogenic by many 
techniques icnown in the art, including crossiinidng or conjugatir^ with a carria* to increase valency, 
mixing with a mitogen to increase the immune response, and combining with an adjuvant to enhance, 
presentation. 

25 An "active vaccine" is a phanmaceutical preparation for human or animal use, which is used 

with the intention of eliciting a specific Immune response. The immune response may be either 
humoral or cellular, systemic or secretory. The immune response may be desired for experimental 
purposes, for the treatment of a particular condition, for the elimination of a particular substance, or for 
prophylaxis against a particular condition or sut>stance. 

30 An "isolated" polynucleotide, polypeptide, protein, antibody, or other substance refers to a 

preparation of the substance devoid of at least some of the other components that may also be 
present where the substance or a similar substance naturally occurs or is Initially obtained from. 
Thus, for example, an isolated substance may be prepared by using a purification technique to enrich 
it from a source mixture. Enrichment can be measured on an absolute basis, such as weight per 

35 volume of solution, or it can t>e measured in relation to a second, potentially interfering sut>stance 
present in the source mixture. Increasing enrichments of the embodiments of this invention are 
increasingly more prefenned. Thus, for example, a 2-fold enrichntent is preferred, 10-fold enrichment 
is more preferred, 100-fold enrichment is more preferred, 1000-fold enrichment is even mote 
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preferred. A substance can also be provided in an isolated state by a process of artificial assembly, 
such as by chemical synthesis or recombinant expression. 

A polynucleotide used m a reaction, such as a probe used in a hybridization reaction, a primer 
used in a PCR. or a polynucleotide present in a pharmaceutical preparation, is refened to as "specific" 
5 or "selective" rf it hybridizes or reacts with the intended target more frequently, nrwre rapidly, or with 
greater duration than it does with alternative substances. Sinnllarly, an antibody te referred to as 
"specific" or "selective" if It binds via at least one antigen recognition site to the intended target more 
frequently, more rapidly, or with greater duration than it does to alternative substances, A 
polynucleotide or antibody Is said to "^selectively inhibrf or "selectively interfere with" a reaction if it 

10 inhibits or interferes with the reaction between particular substiBtes to a greater degree or for a 
greater duration than it does with the reaction between alternative substrates. An antibody Is capable 
of "specifically delivering" a substance if it conveys or retains that substance near a particular cell type 
nfx>re ftequently or for a greater duration compared with other cell types. 

The "effector cwnponenr of a phanmaceutical preparation is a component which modifies 

IS target cells by altering their function In a desirabte way when administered to a subject bearing the 
cells. Some advanced phannaceuticat preparations also have a "targeting componenr. such as an 
antibody, which helps deliver the effector component more efficaciously to the target site. Depending 
on the desired action, tiie effector component may have any one of a number of modes of action. For 
example, it may restore or enhance b normal function of a ceil, it may eliminate or suppress an 

20 abnormal function of a cell, or it may alter a cell's phenotype. Altemativeiy. rt may kill or render 
dormant a cell witti pattidogical features, such as a cancer cell. Examples of effector componente are 
provided In a later section. 

A "pharmaceutical candidate" or "drug candidate" is a compound believed to have therapeutic 
potential, that is to be tested for efficacy. The "screening" of a phanmaceutical candidate refers to 

25 conducting an assay that is capable of evaluating the efficacy and/or specificity of the candidate. In 
this context, "efficacy" refers to the ability of ttie candidate to effect ttie cell or organism it Is 
administered to in a beneficial way: for example, tiie limitation of ttie pathology of cancerous cells. 

A "cell line" or "cell culture" denotes higher eukaryotic cells grown or maintained In vitro. It is 
understood that the descendants of a cell may not be completely identical (eiUier morphologically, 

30 genotypically. or phenotypically) to the parent cell. Cells described as "uncultured** are obtained 
directiy from a living organtsn), and have been maintained tor a limited amount of time away from the 
organism: not long enough or under conditions for the cells to undergo substantial replication. 

"Genetic alteration" refers to a process wherein a genetic elen^nt introduced into a cell 
other than by mitosis or meiosis. The element may be heterologous to the cell, or it may be an 

35 additional copy or improved version of an element already present in the cell. Genetic alteration 
may be effected, for exampte, by transfecting a cell with a recombinant piasmid or other 
polynucleotide through any process known in the art. such as electroporation, calcium phosphate 
precipitation, or contacting wtth a polynucleotide-liposon)e complex, or by tiBnsduction or Infection 
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with a DNA or RNA virus or viral vector. The alteration is preferat>ly but not necessarily inheritat)le 
t)y progeny of the altered ceD. 

A "host ceir is a cell which has been genetically altered, or is capable of being genetically 
altered, by administration of an exogenous polynudeottde. 
5 The terms "cancerous cell" or "cancer ceil", used either in the singular or plural fbmi, refer to 

cells that have undergone a malignant transfbmiation that makes them pathological to the host 
organism. Malignant transfbmiation is a single- or multi-step process, which involves in part an 
alteratiDn in the genetic makeup of the cell and/or the expression profile. Malignant transf6rmatk)n 
may occur either spontaneously, or via an event or combinatton of events such as drug or chemical 
10 treatment, radiation, fusion with other cells, viral infectk}n, or activation or inactivation of particular 
genes. Malignant transfbmiatton may occur in vivo or in vitro, and can if necessary be experimentally 
induced. 

A frequent feature of cancer cells is the tendency to grow in a manner that Is uncontrollable 
by the host but the pathology associated with a particular cancer cell may take another fbnn. as 

15 outlined infra. Primary cancer cells (that is. cells obtained from near the site of mafignant 
transfiDrmatk>n) can be readily distinguished from non-cancerous cells by well-established techniques, 
particularly histological examinaticxi. The definition of a cancer cell, as used herein, includes not only 
a primary cancer cell, but any cell derived from a cancer cell ancestor This includes metastasized 
cancer cells, and in vitro cultures and ceH lines derived from cancer cells. 

20 The "pathology" caused by a cancer cell within a host is anything that compromises the 

welh-being or normal physbtogy of the host This may involve (but is not limited to) abnormal or 
uncontrollable growth of the ceil, metastasis, release of cytokines or other secretory products at an 
inappropriate level, manifestation of a function inappropriate for its physiok3gkal milieu, interference 
with the normal functbn of neighboring cells, aggravation or suppression of an inflammatory or 

25 immunotogicai response, or the hariaoring of undesirable chemical agents or bwasive organisn^. 

"Treatmenr of an individual or a cell is any type of interventbn in an attempt to alter the 
natural course of the individual or cell. For example, treatment of an indivkJual may be undertaken to 
decrease or limit the pathology caused by a cancer cell harbored in the indivkJual. Treatment includes 
(but is not limited to) administration of a compositk>n, such as a phannaceutical ckxnposition. and may 

30 be performed either prophylactically. or subsequent to the Initiation of a pathologic event or contact 
with an etiologic agent Effective amounts used in treatment are those whfch are sufficient to 
produce the desired effect and may be given in single or divkled doses. 

A "control cell* is an altemative source of celis or an alternative cell line used in an expervnent 
for conrtparison purposes. Where the purpose of the experiment is to establish a base line for gene 

35 copy number or expresston level, it Is generally preferable to use a contn>l oell that is not a cancer 
cell. 

The tenm "cancer gene" as used herein refers to any gene which is yiekling transcription or 
translation producte at a substantially altered level or in a substentially altered form in cancerous cells 
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Compared with non-cancerous cells, and which may play a role in supporting the malignancy of the 
cell. It may be a nonmally quiescent gene that becomes activated (such as a dominant 
proto-onoogene), it may be a gene that becomes mpressed at an abnonmalty high level (such as a 
growth factor receptor), it may be a gene that becomes mutated to produce a varant phenotype, or it 
5 may be a gene that becomes expressed at an abnomnally low level (such as a tumor suppresser 
gene). The present invention is directed towards the discovery of genes In all these categories. 

It is understood that a "clinical sample" encompasses a variety of sample types obtained from 
a subject and useful in an in vitro procedure, such as a diagnostic test. The definition encompasses 
solid tissue samples obtained as a surgical removal, a pathology specimen, or a biopsy specimen, 
10 tissue cultures or cells derived therefrom and the progeny thereof, and sections or smears prepared 
* from any of these sources. Non-limiting examples are samples obtained from breast tissue, lymph 
nodes, and tumors. The definition also encompasses blood, spinal fluid, and other liquid sample of 
biok^ic origin/and may refer to either the cells or cell fragments suspended therein, or to the fiquid 
medium and its solutes. 

16 The term "relative amount* is used where a con^>arison is made between a test 

measurennent and a control measurement Thus, the relative amount of a reagent forming a complex 
in a reaction is tiie anrount reacting with a test specbnen, compared wttti the anxiunt reacting with a 
control specinr^en. The control specimen may be mn separately in the same assay, or it may be part 
of the same sample (for example, normal tissue surrounding a malignant area in a tissue section). 

20 A "differentiar result is generally obtained fix>m an assay in which a comparison is made 

between the findings of two different assay samples, such as a cancerous cell line and a control cell 
line. Thus, for exmple, "differential repression' is observed when the level of expression of a 
particular gene is higher in one cell than another. "Differential display" refers to a display of a 
component, particularly RNA, Urom different cells to determine If there is a difference in the level of the 

25 connponent amongst different cells. Differential display of RNA is conducted, for example, by selective 
production and display of cDNA corresponding thereto. A method for performing differential display is 
provided in a later section. 

A polynucleotide derived from or corresponding to CH1-9a11-2. CH8-2a13-1, CH13-2a12-1, 
or CH14-2a16-1 is any of ttie following: the respective cDNA fragments, tiie corresponding 

30 messenger RNA, including splice variants and fragments tiiereof, both strands of the conesponding 
full-length cDNA and fragments thereof, and the corresponding gene. Isolated allelic variants of any 
of these forms are included. This inventton embodies any polynucleotide corresponding to CH1-9a1 1- 
2, CH8-2a13-1, CHia-2a12«1, or CH14-2a16-1 in an isolated form. It also embodies any such 
polynucleotide that has been cloned or transfected into a cell fine. 

35 When used in referring to the gene screening nnethods of this invention (such as those 

outlined In the last paragraph), "displaying cDNA* Is any technique in which DMA copies of RNA 
(not restricted to mRNA) is rendered detecteble in a quantitative or relatively quantitetive feshion, In 
that DNA copies present in a relatively greater amount in a first sample compared with a second 
sample generates a relatively stronger or weaker signal compared with ttiat of the second sample 
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due to the difference in copy number. Separate display of different cDNA in a preparation 
(particularly but not Kn)lted to cDNA of different size) allows comparison of levels of a particular 
cDNA between different samples. A preferred method of display is the differential display 
technique, and enhancements thereupon described in this disclosure and elsewhere. 

5 The term "digested** DNA encompasses DNA (particuiariy chromosomal DNA) that has 

been fragmented by any suitable chemical or enzymatic means into fragments conveniently 
separable by standard techniques, particuiariy gei electrophoresis. Digestion with a restriction 
endonuciease specific for a particular nucleotide sequence is preferred. 

"Hybridizing" in this context refers to contacting a first polynucleotide with a second 

10 polynucleotide under conditions that permit the formation of a multi-stranded polynucleotide duplex 
whenever one strand of the first polynucleotide has a sequence of sufficient complementarity to a 
sequence on the second polynucleotide. The duplex may be a bng-lived one. such as when one 
DNA molecule is used as a labeled probe to detect another Df^ molecule, that may optionally be 
bound to a nitrocellulose filter or present in a separating gel. The duplex may also be a shorter- 

15 lived one. such as when one DNA molecule is used to prime an amplification reaction of the other 
DNA molecule, and the amplified product is subsequently detected. The practitioner may alter the 
conditions of the reaction to alter the degree of complementarity required, as long as sequence 
specificity remains a determining factor in the reaction. 

Unless explicitly indicated or othenvise required by the techniques used, the steps of a 

20 method of this invention may be performed in any order, or combined where desired and 
appropriate, in one example, in the method comprising steps a) through h) that Is described 
above, it is entirely appropriate to conduct steps a) to c) of the method either before or after steps 
e) to g) of the method, as long as the cDNA ultimately selected fulfills the criteria of both steps d) 
and step h). In another example, screening against different digested DNA preparations, even if 

25 outlined separately, may optionally be done at the same time. All permutations of this kind are 
within the scope of the invention. 

General me^ods 

30 The practice of the present invention will employ, unless othenvise indicated, conventional 

techniques of molecular biology, microbiology, recombinant DNA. and immunology, which are within 
the skill of the art Such techniques are explained fiilly in the literature. See, for example, "Molecular 
Cloning: A Laboratory Manuar, Second Editton (Sanr^>rook, Fritsch & Maniads. 1989). 
"OligonudeotMe Syndesis" (M.J. Gait ed.. 1984), "Animal CeO Culture" (R.I. Freshney, ed.« 1987); 

35 the series "Methods in Enzymotogy* (Academb Press. Inc.); "Handbook of Experimental Immunology" 
(D.M. Vyfeir & C.C. Blackwell, Eds.), 'Gene Transfer Vectors for Mammalian Cells" (J.M. Miller & MP. 
Catos. eds.. 1987). "Current Protocols in Molecular Biology' (P.M. Ausubel et al., eds., 1987); and 
"Current Protocols in Immunology" (J.E. Coligan et al., eds., 1991). All patents, patent applications, 
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articles and publications mentioned herein, both supra and intra, are hereby incorporated herein by 
reference. 

Featums of the cancer gme screwing method 

5 

The cancer gene screening methods of this invention may be brought to bear to discover 
novel genes associated with cancer. Exemplars of cancer-associated genes identified by this 
method are described betow. The exemplars were identified using breast cancer cell tines and 
tissue, but the strategy can be applied to any cancer type of Interest 

A central feature of the cancer gene screening method of this invention is to look for both 
DNA duplication and RNA overabundance relating to the same gene. This feature is particularly 
powerfijl In the discovery of new and potentially important cancer genes. While amplicons occur 
frequently in cancer, the presently available techniques indicate only the broad chromosomal 
region involved in the duplication event not the specific genes Involved. The present invention 

15 provides a way of detecting genes that may be present in an amplicon from a functional basis. 
Because an early part of the method involves detecting RNA. the method avoids genes that may 
be duplicated in an amplicon but are quiescent (and therefore InBlevant) In the cancer cells. 
Furthermore, it recruits active genes from a duplicated region of the chromosome too small to be 
detecteble by the techniques used to describe amplicons. 

20 Near the heart of this approach are several concepts. One is that genes encoding 

producte implicated positively in the malignant process achieve elevated gene expression as a part 
of malignant transfomnation. In this context, "gene expression" refers to expression at the RNA 
transcription level. Most typically, the RNA is in turn be translated into a protein with a particular 
enzymatic, binding, or regulatory activity which increases after malignant tanslbrmation. in a less 

25 common exampte, the RNA may encode or participate as a ribozyme. antisense polynucleotide, or 
other functional nucleic acid molecule during malignancy. In a third example, RNA expression may 
be Incidentel but symptomatic of an importent event in transformation. 

Another concept Is that overexpression, if central to malignant transfonnation. may be 
achieved in different tumors by different mechanisms, and that at least one such possible 

30 mechanism is gene duplication. Accordingly, a substentlal proportion of transformed cells will have 
an amplicon. or duplicated region of a chromosome, that includes within its compass the 
overexpressed gene. Other transfbrmed cells may achieve RNA overabundance without gene 
duplication, such as by increasing the rate of transcription of the gene (e.g., by upreguiation of the 
promoter region), by enhancing transcript promotion or transport, or by increasing mRNA sun^ival. 

35 Thus, the method enteils screening at the RNA tevel, several cancer cell lines or tumors, 

and several nonnat cell lines or tissue samples at the same time. RNA are selected that show a 
consent elevation amongst the cancer cells as compared with normal cells. Additional strategies 
may be employed in combination with the RNA screening to improve the success rate of the 
nnethod. One such strategy is to use several cancer cell lines that are all Icnown to have duplicated 
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genes in the same region of a particular chromosonie. Thus» the RNA that emerge from the screen 
are more likely to represent a de1it)erate overexpression event, and the overexpressed gene is 
likely to be within the duplicated regfon. A suppleniental strategy is to use freshly prepared tissue 
samples rather than cell lines as controls for base-iine expression. This avoids sele^n of genes 
5 that may alter their expression level just as a result of tissue culturing. Another supplemental 
strategy is to conduct an additional level of screening, following klentification of shared, 
overexpressed RNA. The selected RNA are used to screen DNA from suitable cancer cells and 
normal cells, to ensure that at least a proportion of the cells achieved the overexpresston by way of 
gene duplication. 

10 The strategy for detecting such genes comprises a riumber of innovations over thoseihat 

have been used in prevk>us work. 

The first part of the method is based on a search for partfcular RNAs that are overabundant 
in cancer cells. A first innovation of the method is to compare RNA abundance between control 
cells and severa/ diffemnt cancBr cetis or cancer caff linos of the desired ^pe. The cDNA 

15 fragments that emerge in a greater amount in several different cancer lines, but not In control cells, 
are more likely to reflect genes that are important In disease progressbn, rather than those that 
have undergone secondary or coincidental activatk>n. It is parttoularly pretended to use cancer cells 
that are known to share a common duplicated chromosomal regton. 

A second innovation of this method is to supply as control, not RNA from a cell line or 

20 culture, but from fresh tissue samptes of non-malignant origin. There are two reasons for this. 
First, tine tissue will pravkle the spectrum of expresskm that is typteal to the normal cell phenotype, 
rather than indivkiual differences that may become more prominent in culture. This establishes a 
more reliable baseline for normal expressK)n levels. More importantly, the tissue will be devoid of 
the effects that in vitro culturing may have in altering or selecting particular phenotypes. For 

25 example, prpto-oncogenes or growth factors may become up-regulated in culture. Wftten cultured 
cells are used as the control for differential display, these up-reguiated genes would be missed. 

A thinJ innovatk>n of this method is to undertake a subselection for cDNA corresponding to 
genes that achieve their RNA overabundance in a substential proportton of cancer ceUs by gene 
duplication. To accomplish this, appropriate cDNA corresponding to overabundant RNA Identified 

30 in the foregoing steps are used to probe digests of cellular DNA from a panel of different cancer 
cells, and from normal genomic DNA. cDNA that shovi^ evidence of higher copy numbers in a 
proportion of the panel are selected for further characterizatkm. An addltk>nal advantage of this 
step is that cDNA corresponding to mitochondrial genes can rapMly be screened away by including 
a mitochondrial DNA digest as an additional sample for testing the probe. This eliminates most of 

35 the false-positive cDNA, whteh otherwise make up a majority of the cDNA identified. I 

Thus, the klentificatton of genes yiekiing producte that are present at abnomnal tevels is 
accomplished by a method comprised of the following steps. 

To klentify particular RNA that is overabundant in cancer cells. RNA is prepared from bothi 
cancerous and control cells by stendand techniques. Cancer-associated genes may affect cellular^ 
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metabolism by any one of a number of mechanisms. For example, they may encode ribozymes, \ 
anti-sense polynucleotides, DNA-binding polynucleotides, altered ribosomal RNA. and the like. ^ 
The gene screening ntethods of this invention may employ a comparison of RNA abundance levels 
at the toteil RNA level, not strictly limited to mRNA. However, the vast majority of cancer- 
5 associated genes are predicted to encode a protein gene whose up-reguiation is closely linked to 
the metabolic process. For example, the four exemplary breast cancer genes described elsewhere 
in this application all compiise an open reading frame. Accordingly, a focus on mRNA enriches the 
selectable pool for candidate cancer-associated genes. Focus towards mRNA can be conducted 
at any step in the method. It is parttculariy convenient to use a display method that displays cDNA 

10 copied only from mRNA. In this case, whole RNA may be prepared and analyzed from cancer and 
control cell populations without separating out mRNA. 

In terms of the cancer cells used as an RNA source, it is particulariy advantageous to use 
a plurality of cancer cells known to contain a duplicated gene or chronrK>sonrial segment in the same 
region of the chromosome. The dupHcated segment need not be the same size In all the cells, nor 

15 is it necessary that the number of duplicattons be the same, so img as there is at least some part 
of the duplicated segment that is shared amongst all the cancer cells used in the screen. Thus, a 
minimum of two, and preferably at least three cancer cells are used that are sufficiently 
characterized to identify a shared duplicated region, and can be used as a source of RNA for the 
screening test In contrast, the oontml cell population will not comprise chromosomal duplications. 

20 Assuming the duplication to be related to the n^lignancy of the cancer cells, RNA 

transcrikied from the duplk^ated region is expected to be overabundant compared with that of the 
control cell. Accordingly, a highly effective strategy Is to klentify overabundant RNA that is present 
In aff (or at least several) of the cancer cell preparations, but none of the control preparatk}ns. By 
using cancer cells that share a duplicated chromosomal region, the RNA comparison will be 

25 strongly biased in favor of RNA overabundance transcribed from the shared dupltoated region. 
Since the shared region is optimallj^ only a small segment of a single chromosome, expression 
differences arising from elsewhere in the genome in one cancer cell or another wm not be selected. 
We have found that this is highly effective in eliminating: a) RNA abundance difterBnces resulting 
from normal metebolte variatkxis between cells; and/or b) RNA abundance differences related to 

30 cancer cell malignancy, but occurring secondarily to malignant transfbrmatbn. This is important, 
t>ecause it considerably minimizes the chief deficiency in the use of RNA comparison methods, 
particularly differential display, for the screening of potential cancer genes: namely, the onerous 
number of false-positives that such techniques generate. 

Shared duplicated regions in cancer cells may be identified by a relevant analytical 

35 technkiue, or by reference to such analysis already conducted and published. One approach that 
has been highly effective in mapping approximate sub-chromosomal k>cation$ of duplicated 
segments is comparative genomic hybridization (CGH). This technique Involves extracting. 
ampHfying and labeling DNA from the subject cell; hybridizing to reference metaphase 
chronnosomes treated to rentove repetitive sequences; and ot>serving the positton of the hybridized 
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DNA on the chromosomes (WO 93/18186; Gray el al.). The greater the signal intensity at a given 
position, the greater the copy number of the sequences in the subject cell. Thus, regions showing 
elevated staining correspond to genes duplicated in the cancer cells, while regions showing 
diminished staining correspond to genes deleted In the cancer cells. Related techniques which a 
5 practitioner in the art will be well aware are methods for preparing and using repeat sequence 
chromosome-specific nucleic acid prot>es (US 5,427,932; Weier et al.). methods for staining target 
chromosomal DNA using labeled nucleic acid fragments in conjunction with blocking fragments 
complementary to repetitive DNA segments (US 5,447,841 ; Gray et al.), and methods for detecting 
amplified or deleted chromosomal regions using a mapped library of labeled polynucteotide probes 

10 (US 5.472,842; Stokke et a!.). If desired, multiple fiuorochromes can be used as labeling agents 
with CGH and related techniques, to provide a three-color visualization of deleted, normal, and 
duplicated chromosome abnormalities (Lucas et al.). 

The choice of a particular chromosomal mapping approach is irrelevant especially once 
knowledge of the duplicated region is known, if the locatton of the chromosome duplication is 

15 already established for a cell line to t>e used in RNA comparison during the course of the present 
invention, then it is unnecessary to conduct a mapping technique de novo. For example, 
established cancer cell lines exist for which mapping data is already available in the public domain. 
Provided in the reference section of this application is a list of over 40 articles in which the 
locations of duplicated regions in particular cancer cells are described. In the context of the 

20 present invention, a plurality of cancer celts is chosen for the screening panel based on such data, 
so that they share a duplicated chronnosomal region. The chromosomal location of a suspected 
duplicatfon may be confirmed by hybridizatbn analysis, if desired, using a probe specific for the 
location. 

The cancer cells used for RNA comparison are also generally (but not necessarily) derived 
25 from the same type of cancer or the same tissue. Using cells derived finom the same type of cancer 
increases the prot)ability that the gene ultimately Identified will be common in that type of cancer, 
and suitable as a type-specific diagnostic maricer. Using celts derived from different types of 
cancer is in effect a search for cancer-related genes that are less tissue specific and more related 
to the malignant process in general. Both types of genes are of interest for both diagnostic and 
30 therapeutic purposes. In one illustration highlighted in Example 1. RNA was screened from the 
three breast cancer cell lines BT474, SKBR3, and MCF7, which have been determined by CGI-i or 
Southern analysis to share a duplicated genetic regions in chromosomes 1, 8. 14, 17, and 20. 
When the RNA from these cells was displayed, a number of RNA were found to be overabundant 
in the cancer cells, but not controls (Figure 1). Three RNA overabundant in all three cancer celt 
35 Hnes conresponded to cancer-associated genes located on chromosomes 1, 8. and 14 that are 
listed in Table 1. The chronnosome 13 gene (CHia-2a12-1) was overexpressed in 2 of the 3 cell 
lines; namely BT474 and SKBR3. Southern analysis subsequently established that the 
chromosome 13 gene was duplicated in the same two cell lines (Example 6, Table 5). 



-27- 



wo 97/38085 



PCTAJS97/05930 



Selection of the source or sources of control cell RNA is also a matter of some refinement 
The control RNA can be derived from in vitro cultures of non-malignant ceils, or established cell 
lines derived from a non-malignant source. However, it is preferable for the control RNA to be 
obtained directly from nonmal human tissue of the same type as the cancer cells. This is because 
5 most nomnal cells do not proliferate indefinitely; hence adaptation of a cell into a cell line involves a 
degree of transformation. The transforming event may. in turn, be shared with that of certain 
cancer cells, at least at the level of RNA abundance. Hence, comparison of the RNA levels in 
cancer cells with so-called control cell lines may lead the practitioner to miss genes that are related 
to malignancy. For convenience, control cells may be maintained in culture for a brief period 

10 before the experiment, and even stimulated; however, multiple rounds of cell division are to be 
avoided If possible. Use of both stimulated and unstimulated cells as controls may help provide 
RNA patterns corresponding to the normal range of abundance within various metabolic events of 
the cell cyde. In one illustration highlighted in Example 1, RNA was screened using both 
proliferating and nonH3roliferating ceils. As stated, the screening of breast cancer RNA is 

15 preferably conducted using uncultured nonnal mammary epithelial ceils (tenmed "organoids") as 
sources of control RNA. These cells may be obtained from surgical samples resected from healthy 
t)reast tissue. 

The RNA is preserved until use in the comparison experiment in such a way to minimize 
fragmentation. To fadlitate confirmation experiments, it is useful to use RNA of a reprodudble 

20 character. For this reason, it is convenient to use RNA that has been obtained from stable 
cancerous cell lines and/or ready tissue sources, although reproducibility can also be provided by 
preparing enough RNA so that it can be preserved in aliquots. 

For displaying relative overabundance of RNA In the cancer cells, compared with the 
control cetts. many standard techniques are suitable. These would include any form of subtractive 

25 hybridization or comparative analysis. Preferred are techniques in which more than two RNA 
sources are compared at the same time, such as various types of artntranly primed PGR 
fingerprinting techniques (Welsh et al., Yoshikawa et al.). Particularty preferred are differential 
mRNA display methods and variations thereof, in which the samples are run in neighl)oring lanes in 
a separating geL These techniques are focused towards mRNA by using printers that are specific 

30 for the poly-A tail characterfetic of mRNA (Liang et al. , 1 992a; U.S. Patent 5.262.31 1 ). 

Because many thousands of genes are expressed in the cells of higher organisms at any 
one time, it is preferable to improve the legibility of the display by surveying only a subset of the 
RNA at a time. Methods for accomplishing this are known in the art. A prefened method is by 
using selecth/e primers that initiate PGR replication for a subset of the RNA. Thus, the RNA is first 

35 reverse transcribed by stendard techntajues. Short primers are used for the selection, preferably 
chosen such that altemative primers used in a series of like assays can complete a comprehensive 
survey of the mRNA. 

In a prefenred example, primers can be used for the 3' region of the mRNAs which have an 
oligo-dT sequence, followed by two other nudeotkles (TiNM. where i « 11, N € {A,C.G}, and M e 
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{A.C,6.T}). Thus, 12 possible primers are required to complete the survey. A random or arbitrary 
primer of minimal length can then be used for replication towards what corresponds in the 
sequence to the 5' region of the mRNA. The optimal length for the random primer is about 10 
nucleotides. The product of the PGR reactbn is labeled with a radioisotope, such as ^S. The 
5 labeled cDNA is then separated by molecular weight, such as on a polyacrylamide sequencing gel. 

If desired, variations on the differential display technique may be employed. For example, 
one-base oligo-dT primers may be used (Liang et al., 1993 & 1994), although this is generally less 
preferred because the display pattern is conespondingly more complex. Selection of primers may 
be optimized mathematically depending on the number of RNA species In a tissue of interest 

10 (Bauer et al ). The method may be adapted for non-denaturing gels, and for use with automatic 
DMA sequencers (Bauer et al.). Altematlve radioisotopes (Trentmann et al.) or fluorochromes (Sun 
et al.) may be used for labeling the dtfforenttal display. Differential display may optionally be 
combined with a rit)onuclease protection assay (Yeatman et al.). PGR primers may optionally 
incorporate a restriction site to facilitate dontng (Linskens et al., Ayala et a!.). Using Tag 

15 polymerase from nrujltiple nnanufecturers can Increase the amount of variation under otherwise 
Identical conditions (Haag et al.). Nested PGR primers may be used in differential display to 
decrease background created by oligo-dT primers (WO 95/33760). Other variants of the 
differential display technique are known in the art and described inter afia in the references cited in 
this discfosure. The use of such modifications are within the scope of the present invention, but are 

20 not required, as evidenced by the examples described below. 

Based on the comparison of relative abundance of RNA, particular RNAs are chosen which 
are present as a higher proportion of the RNA in cancerous cells, ^mpared with coritrol cells. 
When using the differential display method, the cDNA corresponding to overabundant RNA will 
produce a band with greater proportkmal intensity amongst neighboring cDNA bands, compared 

25 with the proportional intensity in the control lanes. Desired cDNAs can be recovered most directly 
by cutting the spot In the gel corresponding to the band, and recovering the DNAs therefrom. 
Recovered cDNA can be replicated again for further use by any technique or combination of 
techniques known in the art, including PGR and cloning into a suitable carrier. 

An optional knit highly benefk»al additional screening step, typically performed 

30 subsequentiy to an RNA comparison as described above, is aimed at identifying genes that are 
duplicated in a substantial proportion of cancers. This is conducted by using cONA such as 
selected from differential display to probe digests of chromosomal DNA obtained from two or more 
cancerous cells, such as cancer cell lines. Ghromosomal DNA finom non-cancerous cells that 
essentially reflects the germ line in terms of gene copy number is used for the control. A preferred 

35 source of control DNA in experiments for human cancer genes is placental DNA. whtoh Is readily 
obtainat>te. The DNA samples are cleaved at sequence-specific sites along the chromosome, most 
usually wKh a suitable restrictfon enzyme into fragments of appropriate size. The DNA can be 
btotted directly onto a suitable medium, or separated on an agarose gel before blotting. The latter 
method is preferred, because it enables a comparison of the hybridizing chronK)somal restriction 
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fragment to determine whether the prot>e is binding to the same fragment in all samples. The 
amount of probe binding to DMA digests from each of the cancer ceBs is compared with the amount 
binding to control DMA. 

Because the comparison is quantitative, it is preferable to standardize the measurement 
5 internally. One method Is to administer a second probe to the same blot, probing for a second 
chromosonrmi gene unlikely to be duplicated in the cancer cells. This method is prefen^d, because 
it standardizes not only for differences in the amount of DMA provided, but also for differences in 
the amount transferred during blotting. This can be accomplished by using alternative labels far 
the two probes, or by stripping the first probe with a suiteble eluant before administering the 
10 second. 

To eliminate cDNA for mitochondrial genes, it is preferable to include in a parallel analysis 
a mitochondrial DNA preparation digested with the same restriction enzyme. Any cDNA probe that 
hytirMizes to the apprepriate mitochondrial restriction firagnf>ents can be suspected of 
corresponding to a mitochondrial gene. 

15 In the initial replication of the RNA, the random primer may bind at any location along the 

RNA sequence. Thus, the copfed and replicated segment may be a fragment of the fulMength 
RNA, Longer cDNA corresporKling to a greater portion of the sequence can be obteined, if 
desired, by several techniques known to practitioners of ordinary skill. These include using the 
cDNA fragment to isolate the corresponding RNA, or to isolate complementery DNA from a cDNA 

20 library of the same species. Preferably, the library is derived from the same tissue source, and 
more preferably from a cancer cell line of the same type. For example, for cDNA corresponding to 
human breast cancer genes, a preferred library is derived from breast cancer cell tine BT474, 
constructed in lambda 6T10. 

Sequences of the cDNA can be d^ermined by stendard techniques, or by submitting the 

25 sample to commercial sequencing services. The chromosomal kx»tions of the genes can be 
detemiined by any one of several methods known in the art. such as in situ hybridlzatbn using 
chromosomal smears, or panels of somatic cell hybrids of known chromosomal composition. 

The cDNA obtelned through the selection process outlined can then be tested against a 
larger panel of cancer cell lines and/or fresh tumor cells to determine what proportion of the cells 

30 have duplkated the gene. This can be accomplished by using the cDNA as a probe for 
chromosomal DNA digests, as described eariier. As illustrated in the Example sectton, a preferred 
method for conducting this determination is Southern analysis. 

The cDNA can also be used to determine what proportion of the cells have RNA 
overabundance. This can be accomplished by standard techniques, such as stot blots or btots of 

35 agarose gels, using whole RNA or messenger RNA from each of the cells In the panel. The blots 
are then probed with the cDNA using stendard techniques. It is preferable to provide an intemal 
foading and blotUng control for this analysis. A preferred method is to re-probe the same blot for 
transcripte of a gene likely to be present in about the same tevel in all cells of the same type, such 
as the gene for a cytoskeletel protein. Thus, a preferred second probe is the cDNA for bete-actin. 
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Using a novel cDNA found by this selection procedure, it is anticipated that essentially all 
cancer cells showing gene duplication will also show RNA overabundance, but that some will show 
RNA overabundance without gene duplication. 

The practitioner will readily appreciate that the strategies for identifying genes that are 
5 duplicated and/or associated with RNA overabundance may be reversed appropriately to screen 
for genes that are deleted and/or associated with RNA underabundance. The principles are 
essentially the same. Genes that are frequently down-regulated in cancer (such as tumor 
suppresser genes) may be down-regulated by different mechanisms in different cells, and a gene 
with this behavior is nwre lilcely to be central to malignant transformation or persistence of the 
10 malignant state. 

To screen for such down-regulated genes according to the present invention, RNA is 
prepared from a plurality of tumors or cancer cell lines and the abundance is compared with RNA 
preparation from control cells. Again, it is highly preferable to use cancer cells that share a deleted 
gene in the same chromosomal region, in order to focus any dlfterences at the RNA level towanis 

1 5 particular alterations in cancer cells and away from normal variations or coincidental changes. The 
CGH technique may be used to identify deletions in previously uncharacterized cancer cells. As 
before, cancer cells may be chosen on the basis of previous knowledge of deleted regions; there is 
no need to conduct methods such as CGH on previously characterized lines. cDNA from the RNA 
of cancer cells is displayed (prefecably by differentia! display) atongside cDNA copied fiwn 

20 (preferably uncultured) control cells, and cDNA is selected that appears to be underrepresented in 
at least two (preferably more) of the cancer cells compared with the control cells. cDNA thus 
selected may optionally be further screened against digested DNA preparations, to confirm that the 
RNA underabundance observed in the cancer cell populatfons is attributable in at least a proportion 
of the cells to an actual gene deletion. 

25 As before, the cDNA may be used for sequencing or rescuing additional polynucleotides, in 

this case not from the cancer cells but from cells containing or expressing the gene at nonnal 
levels. Pharmaceuticals based on deleted genes or those associated with underexpressed RNA 
are typically oriented at restoring or upregulating the gene, or a functional equivalent of the 
encoded gene product 

30 

Th0 idmUfication of four oxempiary cancer associatBd genes 

To identify particular RNA that is overabundant in cancer cells, RNA has been compared 
between breast cancer cells and control oeOs. The amount of total ceiy ar RNA was compared using 
35 a modified differential display method. Primers were used for the 3' region of the mRNAs which have 
an oligo-dT sequence, followed by two other nucleotides as described in the prevbus section. 
Random or art^itrary primers of about 10 nucleotides were used for replication towards what 
conesponds in the sequence to the 5* region of the mRNA. The labeled amplification prxjduct was 
then separated by molecular weight on a polyacrylamide sequencing gel. 
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Particular mRNAs ««B chosen that were present in a higher proportion of the Rl^ in 
cancerous cells, compared with control cells. acco«,ing to the proportional intensity amongst 
ne^hboring cDNA t«nds. The cDNA was recovered directly from the gel and amplified to provide a 
probe for screening. Candidate polynucleotides were screened by a number of criteria, including both 
Northern and Southern analysis to detennlne if the corresponding genes were duplicated or 
responsible for to RNA overabundance in breast cancer cells. Sequence data of the polynucleotides 
was Obtained and compared «nth sequences in GenBank. Novel polynucleotides with the desired 
expression patterns were used to probe for longer cDNA inserts in a XgtIO Bbre^r constructed from 
the breast cancer cell Hne BT474. which were then sequenced. 

Further description of the actual experimental events that occurred during identification of the 
four exemplary genes, and sequence data for CH1-9a11-2. CH8.2a13-1. CH13.2a12-1 and CH14- 
2a16-l are provided in the Exampte section. 

Preparation <ifpolynuel9otMea,polyp»ptkle8mHlan0bodles 

Polynucleotides based on the cDNA of CH1.9a11.2. CH8.2a13.1. CH13-2a12-1 CH14. 
2a16-1. can be rescued from cloned plasmids arm phage provided as part of this Invention They 
may also be obtained from breast cancer cell libraries or mRNA preparations, or from nom»I human 
tissues such as placenta, by judicious use of primers or probes based on the sequence data provided 
herein. Altematively. the sequence data provided herein can be used in chemical synthesis to 
produce a polynucleotide with an identical sequence, or that incorporates occasional variations. 

Polypeptides encoded by the corresponding mRNA can be prepared by several different 
methods, all of which wni be known to a predilioner of ordinaiy skill. For example, the appropriate 
strand of the fuB^ngth CDNA can be operawy inked to a suitable promoter, and transit 
suitable host cell. The host cell is then cultured under condittons that aHow transcriptfon and 
translatfon to occur, and the polypeptMe is subsequently recovered. Another convenient method is to 
detemiine the polynucleotide sequence of the cDNA. and predict the polypeptide sequence according 
to the genetic code. A polypeptide can then be prepared directly, for example, by chemical synthesis, 
either identk^l to the predated sequence, or incorporating occasional variations. 

Antibodies against polypeptides of this invention may be prepared by any method known in 
Ihe art For stimulating antibody productk)n m an animal, it is often preferable to enhance the 
immunogenfcify of a polypeptUe by such technkiues as polymerization with glutaraldehyde. or 
combining with an adjuvant, such as Freund's attjuvant The immunogen is injected into a suitable 
experimental animal: P»e«^ a rodert for the preparation of monoctonal arilibodies: preferably ^ 
larger animal such as a rabbit or sheep for preparation of polyctenal antibodies, it Is preferable to 
provkle a second or booster injednn after about 4 weeks, and begin hanresting the antibody souroe 
no less than about 1 week later. 

Sera harvested from the immunized animals provUe a source of polyctonal antibodies. 
Detailed procedures for puritying specific antibody activity from a souroe material are known within the 
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aft. Unwanted, activity cross-reacting with other antigens, if present, can be removed, for example, by 
running the preparation over adsorbents made of those antigens attached to a solid phase, and 
collecting the unbound fiBCtion. If desired, the specific antibody activity can be further purified by such 
techniques as protein A chromatography, ammonium sultete precipitation, ion exchange 
5 chromatography, high-perfonnance liquid chromatography and immunoafltnity chromatography on a 
column of the Immunizing polypeptide coupled to a solid support 

Altematively. immune cells such as splenocytes'can be recovered from the immunized 
animals and used to prepare a monoclonal antibody-producing cell line. See, for example. Harrow & 
Lane (1988). U.S. Patent Nos. 4,491.632 (J.R. Vtfands et al.). U.S. 4.472.500 (C. Milstein et al.), and 
10 U.S. 4.444.887 (M.K. Hoffman et al.) 

Briefly, an antibody-producing line can be produced inter cdia by cell fusion, or by transfecting 
antibody*produdng cells with Epstein Ban* Virus, or transforming with oncogenic DWl The treated 
cells are cloned and cultured, and cbnes are selected that produce antibody of the desired specificity. 
Specificity testing can be perfbrmed on culture supematente by a number of techniques, such as 
15 using the imnruinrang polypeptide as the detecting reagent in a standard Immunoassay, or using ceOs 
expressing the polypeptide In immunohistochemistry. A supply of nrxinoctonal antibody from the 
selected clones can be purifted from a large volume of tissue culture superrtatent, or from the ascites 
fluid of suitebly prepared host animals injected with the done. 

Eftective variations of this nrtethod indude those in which the immunization with the 
20 polypeptide is perfbrmed on isolated cells. Antibody fragments and ottier derivatives can be prepared 
by mettiods of stendard protein chemtetry, such as subjecting tiie antibody to cleavage witii a 
proteolytic enzyme. Geneticsdiy engineered variants of the antibody can be produced by obteining a 
polynucleotide encoding the antibody, and applying the general mettKXis of molecular biology to 
introduce mutetions and translate ttie variant 

25 

fise In diagnosis 

Novel cDNA sequences corresponding to genes assodated with cancer are potentially useful 
as diagnostic aids. Similarty. polypeptides encoded by such genes, and antibodies specific for these 

30 polypeptides, are also potentially useful as diagnostic aids. 

More specificaily. gene duplication or overabundance of RNA In particular cells can help 
identify those cells as being cancerous, and tiiereby play a part in ttie initial diagnosis. Increased 
levels of RNA correspomfing to CH1-9a11.2, CH8-2a13-12, CH13.2a12-1, and CH14*2a16-1 are 
present In a substantial proportion of breast cancer cell lines and primary breast tunK>rs. in addition. 

35 pre&mmary Norttiem analysis using probes for CH8-2a13-12, CH13-2a12.1, and CH14-2a16-1 
Indicates that these genes may be duplicated or be associated wSh RNA overabundance in certein 
cell Dnes derived finom cancers other ttian breast cancer, induding colon cancer, lung cancer, 
prostrate cancer, glioma, and ovarian cancer. 
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For patients already diagnosed with cancer, gene duplication or overabundance of RNA can 
assist with dinicai management and prognosis. For example, overabundance of RNA may be a 
useful predldor of disease survival, metastasis, susceptibaity to various regimens of standard 
chemotherapy, the stage of the cancer, or its aggressiveness. See generally the article by Blast U S 
Patent No. 4.968.603 (Slamon et al.) and PCT Application WO 94/00601 (Levine et al.). All of these 
detenninations are important in helping the clinictan choose between the available tr»atment options. 

A particularly important diagnostic application contemplated in this invention is the 
identification of patients suitable for gene-specific therapy, as outtined in the following section. For 
example, treatment directed against a particular gene or gene pirjduct is appropriate in cancers where 
the gene is duplicated or tiiere is RNA overabundance. Given a particular phamiaceutical that is 
directed at a particular gene, a diagnostic test specific for ttie same gene is important in selecting 
patients likely to benefit from the phamiaceuflcal. Given a selection of such pham«ceutica!s specific 
Ibr different genes, diagnostic tests fbr each gene are important m seteding which phamiaceutical is 
likely to benefit a particular patient 

The polynucleotide, polypeptide, and antibodies embodied in tins Invention provide specific 
reagents tttat can be used in standard diagnostic procedures. The achial procedures fbr conducting 
diagnostic tests are extensively known in ttie art. and are routine Ibr a practitioner of ordfeiary skai. 
See. fbr example. U.S. Patent No. 4.968.603 (Slamon et al.). and PCT Applications WO 94700601 
(Levine et al.) and WO 94/17414 (K Keyomarsi et al.). What fottows is a brief non4imiting survey of 
20 some of the known procedures tt«t can be applied. 

Generally, to perfDnn a diagnostic mettiod of ttiis invention, one of ttie compositions of ttiis 
Invention is piovMed as a reagent to detect a target in a cllnk»l sample wrth whteh it reacts. Thus, ttie 
polynucleotide of tills invention can be used as a reagent to delect a DNA or RNA target, such as 
might be present in a cell witti duplication or RNA overabundance of ttie corresponding gene. The 
polypeptide can be used as a reagent to delect a target for whkdi It has a specific binding site, such as 
an antibody molecule or (if ttie polypeptide is a receptor) ttie corresponding Hgand. The antibody can 
be used as a reagent to detect a target it specifically recognizes, such as ttie polypeptide used as an 
immunogen to raise it 

The target is supplied by obteining a suitable tissue sample from an individual fbr whom ttie 
diagnostic parameter Is to be measured. Relevant test samples are ttiose obteined from indivMuals 
suspected of conteining cancerous cells, particularly breast cancer cells. Many types of samples are 
suitable for ttiis purpose, including ttiose ttiat are obtained near ttie suspected tiimor site by btopsy or 
suighal dissection, in vibo cultures of cells derived ttiereflwn, bkxid. and bkxid components. If 
desired, ttie target may be partially purified liom ttie sample or amplified before the assay Is 
conducted. The reaction is perfomiedtjy contacting ttie reagent wltti ttie sample under conditions ttiat 
will aOow a complex to fomi between ttie reagent and ttie terget The reaction may be perfbnned m 
solution, or on a solid tissue sampte, for example, using histotogy sections. The fonnation of ttie 
complex is detected by a number of techniques known in ttie art For example, ttie reagent may be 
supplied writti a label and unreacted reagent may be removed flwi ttie complex; ttie amount of 
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remaining label thereby indicating the annount of complex fomned. Further details and alternatives for 
complex detection are provided in the descriptions that toWcm. 

To detennine whether the amount of complex fonmed is r^resentative of cancerous or non- 
cancerous cells, the assay result is compared with a similar assay conducted on a control sample. It 
5 is generally preferable to use a control sample which is from a non-cancerous source, and othervirise 
similar in composition to the clinical sample being tested. However, any control sample may be 
suitable provided the relative amount of target in the control is known or can be used for comparative 
purposes. Where the assay is being conducted on tissue sections, suitable control cells with normal 
histopathology may sunxxind the cancerous cells being tested. It is often preferable to conduct the 

10 assay on the test sample and the control sample simultaneously. However, if the amount of complex 
formed is quantifiable and sufficiently consistent it is acceptable to assay the test sample and control 
sample on different days or in different laboratories. 

A polynucleotide embodied in this invention can be used as a reagent for determining gene 
duplication or RNA overabundance that may be present in a dinicai sample. The binding of the 

15 reagent polynucleotide to a target in a dinicai sample generally relies in part on a hybridization 
reaction between a region of the polynucleotide reagent and the DNA or RNA in a sample being 
tested. 

If desired, the nucleic acid may be extracted from the sample, and may also be partially 
purified. To measure gene duplication, the preparation is preferably enriched for chromosomal DNA; 

20 to measure RNA overabundance, the preparation is preferably enriched for RNA. The target 
polynucleotide can be optionally subjected to any combination of additional treatments, including 
d^estion with restriction endonudeases, size separation, for example by electrophoresis in agarose 
or polyacrylamide, and affixed to a reaction matrix, such as a blotting material. 

Hybridization is allowed to occur by mixing the reagent polynucleotide with a sample 

25 suspected of containing a target polynucleotide under appropriate reaction conditions. This may be 
followed by washing or separation to remove unreacted reagent Generally, both the target 
polynucleotide and the reagent must be at least partly equilibrated into the single-stranded form in 
order for complementary sequences to hybridize efficiently. Thus, it may l>e useful (particulariy in 
teste for DNA) to prepare the sample by standanj denaturation techniques known in the art 

30 The minimum complenrienterity between the reagent sequence and the terget sequence for a 

complex to fonm depends on the conditions under which the complex-forming reaction is allowed to 
occur. Such conditions indude temperature, ionic strength, time of incubation, the presence of 
additional solutes in the reaction mixture such as formamide, and washing procedure. Higher 
stringency conditions are those under which h^her minlnnum complementarity is required for steble 

35 hybridization to 0000-. It is generally preferabfe in diagnostic applications to increase the specificity of 
the reaction, minimizing cross-reactivity of the reagent polynucleotide alternative undesired 
hyt>ridizatton sites in the sample. Thus, it Is preferat>le to conduct the reaction under conditions of 
high stringency: for exampfe, in the presence of high temperature, low salt formamide, a comt)lnatton 
of these, or followed by a low-salt wash. 
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in order to detect the complexes formed beb^n the reagent and the target, the reagent is 
generally prc».ided with a la»»l Some of the labels often used in this type of assay include 
radKMsotopes such as «P and »P. chemiluminescent or fluorescent reagents such as fluorescein and 
enzymes such as alkaline phosphatase that are capable of producing a cotored solute or precipitant 
The label may be intrinsic to the reagent, it may be attached by direct chemical linkage, or it may be 
connected through a series of intennediale reactive molecules, such as a biot»vavkfin complex or a 
series of inter-reactive polynucleotides, -me label may be added to the reagent before hybridiration 
with the target polynucleotWe, oraflennrards. 

To improve the sensitivity of the assay, it is often desirable to increase the signal ensuing 
from hybridfeation. This can be accomplished by repficating either the target polynucleotide or the 
reagent polynudeotMe. such as by a polymerase chain reactfon. Altematively. a combination of 
serially hybridizing polynucleotkles or branched poVnucleotides can be used in such a way that 
multiple label components become incorporated into each complex. See U.S. Patent No 5 124 246 
(UrdeaetaL). 

An antibody embodied In this imrention can also be used as a reagent in cancer diagw^ or 
for detem«ning gene duplicatton or RNA overabundance that may be present in a clinical sample 
This relies on the fact that overabundance of RNA in affected ceBs is often associated vvith increased 
productfon of the conesponding polypeptide. Several of the genes upnegulaled in cancer cells 
encode for cell surface receptors A for example, erf,B-2, cmyc and epWemial growth factor. 
AMematively. the RNA may encode a protein kept inside the cell, or it may encode a protein secreted 
by the cell into the surrounding mifeu. 

Any such protein product can be detected in solkl tissue samples and cultured cells by 
immunohistotogkal techniques that win be obvfous to a pradMioner of ordinary skill. Generally the 
lis»ue is presenred by a combinatfon of technk,ues whk:h may include cooling, exchanging into 
different solvents, fixing with agents such as paraformaMehyde. or embedding in a commercially 
available medium such as paraffin or OCT. A sectk>n of the sample is suitably prepared and overiakl 
with a primary antibody specific for the prxjtein. 

The primary antibody may be provMed directly with a suitable label. More frequently, the 
primary antibody is detected using one of a number of devetoping reagents wh«h are easily prod^iced 
or available commercially. Typically, these developing reagents are antHmmunogtobulin or protein A, 
and they typically bear labels whfch include, but are not limited to: fluorescent mariners such as 
fluorescein, enzymes such as peroxkJase that are capable of precipitating a suifable chemical 
connpound. electron dense maricere such as coHoUal goM. or radtoisotopes such as The section 
is then visualized using an appropriate mnroscopk: technk^ue. and the level of labeling is compared 
between the suspected cancer oefl and a control cell, such as cells surrounding the tumor area or 
those taken from an alternative sMe. 

The amount of protein corresponding to the cancer-associated gene may be detected in a 
standard quantitative immunoassay. If the protein is secreted or shed from the ceH in any appreciable 
amount, it may be detectable in plasma or semm samples. Altematively. the target protein may be 
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sohJbiHzed or extracted from a solid tissue sample. Before quantitating. the prote^ may optionally be 
affixed to a solid phase, such as by a blot technique or using a capture antibody. 

A number of immunoassay nwthods are estabfished tn the ait for perfbnning the quantitation. 
For example, the protein may be mixed wmth a pre<letemnined non-limiting amount of the reagent 
5 antibody specific for the protein. The reagent antibody may contain a directly attached label, such as 
an enzyme or a radioisotope, or a second labeled reagent may be added, such as 
antHmmunoglobulin or protein A. For a solid-phase assay, unieacted reagents are removed by 
washing. For a liquid-phase assay, unreacted reagents are removed by some other separation 
technique, such as filtration or chromatography. The amount of label captured in the complex is 

10 positively related to the amount of target protein present in the test sample. A variation of this 
technique is a competitive assay, in which the target protein competes with a labeled analog for 
binding sites on the specific antibody. In this case, the amount of label captured is negatively related 
to the amount of target protein present In a test sample. Results obtained using any such assay on a 
sample from a suspected cancer-bearing source are compared with those firom a non-cancerous 

15 source. 

A polypeptide embodied in this invention can also be used as a reagent in cancer diagnosis, 
or for determining gene duplication or RNA overabundance that may be present in a clinical sample. 
Overabundance of RNA in afferted cells may result in the corresponding polypeptide being produced 
by the cells in an abnomrial anxxint On occasion, overabundance of RNA may occur concurrently 
20 with expression of the polypeptide in an unusual fbrni. TWs in turn may result in stimulation of the 
immune response of the host to produce its own antibody molecules that are specific for the 
polypeptide. Thus, a ntmiber of human hybridomas have been raised from cancer patients that 
produce antibodi^ against their own tumor aitigens. 

To use the polypeptide in the d^edion of such antibodies in a subject suspected of havhg 
cancer, an immunoassay is ccmducted. Suitable methods are generally the same as the 
immunoassays outlined in the preceding paragraphs, except that the polypeptide is provided as a 
reagent, and the antibody is the target in the clinical sample which is to be quantified. For example, 
human IgG antibody molecules present in a serum sample may be captured with solid-phase protein 
A. and then overlaid with the labeled polypeptide reagent The amount of antibody would then be 
proportional to the label attached to the solid phase. Altematively. cells or tissue sections expressing 
the polypeptide may be overlaid first with the test sample containing the antibody, and then with a 
detecting reagent such as labeled anti-imnKinoglobulin. The amount of antibody would then be 
proportional to the label attached to the oelis. The amount of antibody detected in the sarr^le from a 
suspected cancerous source would be coiTv>ared with the amount delected in a control sainpte 

These diagnostic procedures may be perfomied by diagnosHc laboratories, experimental 
laboratories, practitioners, or private incfividuals. This invention fxovides diagnostic kits which can be 
used in these settings. The presence of cancer cells in the individual may be maniltet in a clinical 
sample obtained from that individual as an alteration In the DMA. RNA. prpiein. or antibodies 
contained in the sample. An alteratton in one of these components resulting ftom the presence of 
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cancer may take the form of an increase or decrease of the level of the component or an alteration in 
the torn of the component, compared with that in a sample ftorn a healthy individual. The clinical 
sample is optionally pre4realed for enrichment of the target being tested for. The user then applies a 
reagent contained in the kit in order to detect the changed level or alteration in the diagnostic 
component 

Each kit necessarily comprises the reagent which renders the procedure specific: a reagent 
polynucleotide, used for detecting target DMA or RNA; a reagent antibody, used for detecting target 
protein; or a reagent polypeptide, used for detecting target antibody that may be present in a sample 
to be analyzed. The reagent is supplied in a solid torm or liquM buffer that is suitable for inventory 
storage, and later for exchange or addition into the reaction medium when the test is perfonned. 
Suitable packaging is piovkied. The kit may optronally provkle additional components that are useful 
in the procedure. These optnnal components include bullere. capture reagents, devetoping reagents, 
labels, reacting surfaces, means for detectfon, control samples, instreclions. and interpretive 
information. 



Use In pharniaewUcal thv^opment 



Embodied in this invention are modes of treating subjects bearing cancer cells that have 
overabundance of the pai1k:ular RNA described. The strategy used to obtain the cDNAs provWed in 
this invention was deliberately focused on genes that achieve RNA overabundance by gene 
duplkation in some cells, and by attamative mechanisms in other cells. These alternative 
mechanisms may include, for exampte. tianskicaHon or enhancement of Iranscriptton enhancing 
elemente near the coding region of the gene, detetfcxi of repressor binding sites, or altered productton 
of gene regublore. Such mechanisms wduM result in more RNA being transcribed from the same 
gene. Alternatively, the same amount of Ri^ may be transcribed, but may p^st longer in the cell, 
resulting in greater abundance. This could occur, for example, by reduction in the tevel of ribozymes 
or protein enzymes that degrade RNA. or in the modificatkm of the RNA to render it more resistent to 
such en^mes or sponteneous d^radation. 

Thus, different cells make use of at least two different mechanisms to achieve a singte result 
A the overabundance of a particutar RNA. This suggests that RNA overabundance of these genes is 
central to the cancer process An the affected oeHs. Interfering with the speclffe gene or gene product 
wouW consequently nwdify the cahcer process. It is an otiiective of this inventton to provide 
phanmaceutical composWona thm enabte therapy of this kind. 

One way tMs bwentkm achieves this obiedive is throi^h screening candidate dnigs. The 
gmeral screening strategy is to apply the candMate to a manifestation of a gene associated with 
cancer, and ttien detemwie whether the effect is beneficial and specific. For exampte, a compositton 
that interiieres with a polynudeotMe or polypeptide corresponding any of the novel cancer-assocated 
genes described herein has the potential to bkx:k the associated pathokjgy when administered to a 
tunwr of tiie appropriate phenotype. It is not necessary that the mechanism of interference be known; 



wo 97/38085 



PCTAJS97/05930 



onty that the Interference be preferential for cancerous cells (or ceils near the cancer site) but not 
other cells. 

A preferred method of screening to provide cells in which a polynucleotide related to a 
cancer gene has been transfected. See. for example, PCT application WO 93/08701 . A practitioner 
5 of ordinary skill will be well acquainted with techniques for transfecting eukaryotic cells, including the 
preparation of a suitable vector, such as a viral vector conveying the vector into the cell, such as by 
electroporation; and selecting cells that have been transformed, such as by using a reporter or drug 
sensitivity element 

A cell line is chosen which has a phenotype desirable in testing, and which can be maintained 
10 well in culture. The cell line is transfected with a polynucleotide conresponding to one of the 
cancer-associated genes Identified herein. Trensfection is performed such that the polynucleotide is 
operably linked to a genetic controlltng element ttiat permits the correct strand of the polynucleotide to 
be transcribed witiiin the cell. Successful trensfection can be determined by the increased abundance 
of the RNA compared witti an untransfecfced cell. It is not necessary that tiie cell prevtously be devoid 
15 of the RNA, only ttiat ttie trensfection result in a substential increase in the level observed. RNA 
abundance in ttie cell is measured using the same polynucleotide, according to the hybridization 
assays outlined earlier. 

Drug screening is perfomfied by adding each candidate to a sample of transfected cells, and 
nrK>nitor1ng the effect. The experiment includes a parallel sample which does not receive the 

20 candidate drug. The treated and untreated cells are then con^3ared by any suitable phenotypic 
criteria, including but not limited to microscopic analysis, viability testing, ability to replicate. 
histologk:al examination, ttie level of a particular RNA or polypeptide associated witii the cells, ttie 
level of enzynnatic activity expressed by ttie cells or ceil lysates. and ttie atMlity of ttie cells to Interact 
with other cells or compounds. Differences betwem tteated and untreated cells indicates effects 

25 attributeble to ttie candidate. In a preferred mettiod, ttie efliect of ttie drug on ttie cell transfected witti 
the polynucleotide is also centred witti tiie effect on a control cell. Suitebfe control cells include 
untransfected cells of similar ancestry, cells transfected witii an altemaUve polynucleotide, or celts 
transfected with tiie same polynucleotide in an inoperative feshion. Optimally, ttie drug has a greater 
effect on operably transfected cells tiian on control cells. 

30 Desirable effects of a candidate drug include an effect on any phenotype that was conferred 

by tiransfectton of the cell Bne witti ttie polynucleotide from tiie cancer-associated gene, or an effect 
that could fimit a pattiotogical feature of the gene In a cancerous cell. Examples of the tirat type would 
be a drug ttiat limits ttie overabundance of RNA in ttie transfected cell, limite production of ttie 
encoded protein, or limite tiie functional effect of the protein. The efliect of ttie drug would be apparent 

35 when comparing resuHs between treated and untteetfed cells. An example of the second type wouU 
be a drug ttiat makes use of ttie transfected gene or a gene product to specifically poison ttie cell. 
The effect of the drug would be apparent when comparir^ resulte betiween operably transfected cells 
and control cells. 
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UseinUBatment 

This invention also provides gene-spedfic phamnaceuticals in which each of the 
polynucleotides, potypeplides, and antibodies emtxxiied herein as a specific active ingredient in 
6 pharmaceutical compositions. Such compositions may decrease the pathology of cancer cells on 
their own, or render the cancer cells more susceptible to treatment by the non-specific agents^ such 
as classical chemotherapy or radiation. 

An example of how polynucleotides embodied in this invention can be effectively used in 
treatment is gene therapy. See, for example. Morgan et aK, Culver et al., and U.S. Patent No. 

1 0 5,399,346 (French et al.). The general principle is to introduce the polynucleotide into a cancer cell in 
a patient, and allow It to interfere with the expression of the corresponding gene, such as by 
complexing with the gene Itself or with the RNA transcribed from the gene. Entry into the cell is 
facilitated by suitable techniques Icnown in the art as providing the polynucleotide In the form of a 
suitable vector, or encapsulatkm of the polynucleotide in a tiposome. The polynucleotide may be 

1 6 provided to the cancer site by an antigen-specific homing nnechanism, or by direct injection. 

A preferred mode of gene therapy is to provide the polynucleotide in such a way that it will 
replicate inside the cell, enhancing and prolonging the interference effect Thus, the polynucleotide is 
operably linked to a suitable pronnoter. such as the natural promoter of the corresponding gene, a 
heterologous promoter that is intrinsically active in cancer cells, or a hetefx>logous promoter that can 

20 be induced by a suitable agent Preferably, the construct is designed so that the polynucleotide 
sequence operably linked to the promoter is complementary to the sequence of the corresponding 
gene. Thus, once integrated into the cellular genome, the transcript of the administered 
polynucleotkie will be complementery to the transcript of the gene, and capable of hybridizing with it 
This approach is known as anti^nse therapy. See. for exanvpte. Culver et al. and Roth. 

25 The use of antibodies embodied in this invention in the treatment of cancer partly relies on the 

fact that genes that show RNA overabundance in cancer frequently encode cell-surface proteins. 
Location of these proteins at the cell surface may correspond to an important biological function of the 
cancer cell, such as their interaction with other ceils, the modulation of other cell-surface proteins, or 
triggering by an incoming cytokine. 

30 These mechanisms suggest a variety of ways in whtoh a specific antibody nr^y be effective in 

decreasing the pathotogy of a cancer cell. For example, if the gene encodes for a growth receptor, 
then an antibody that blocks the iigand binding site or causes endo^tosis of the receptor would 
decrease the ability of the receptor to provkle its signal to the cell, it is unnecessary to have 
knowledge of the mechanism beforehand; ttie effectiveness of a particufar antibody can be predicted 

35 empirically by testing with cultured cancer cells expressing the corresponding protein. Monodonai 
antiix>dies may be more effective in this form of cancer therapy if several different clones directed at 
different determinants of tiie same cancer-associate gene product are used In combination: see PCT 
application WO 94/00136 (Kasprzyk et al.). Such antibody treatment may directty decrease ttie 
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pathology of the cancer cells, or render them more susceptible to non-specific cytotoxic agents such 
as platinum (Lippman). 

Another example of how antit)Odles can be used In cancer therapy is in the specific targeting 
of effector components. The protein product of the cancer-associated gene is expected to appear in 
5 high frequency on cancer cells compared to unaffected cells, due to the overabundance of the 
corresponding RNA. The protein therefore provides a marlcer for cancer cells that a specific antibody 
can bind to. An effector component attached to the antibody therefore becomes concentrated near 
the cancer cells, improving the effect on those cells and decreasing the effect on non-cancer cells. 
This concentratfon would generally occur not only near the primary tumor, but also near cancer cells 

10 that have metastasized to other tissue sites. Furthermore, if the antibody is abie to induce 
endocytosis. this will enhance entry of the effector Into the cell interior. 

For the purpose of targeting, an antibody specific for the protein of the cancer-associated 
gene is conjugated with a suitable effector component, preferably by a covalent or high-affinity bond. 
Suitable effector components in such compositions Include radionudidrasuch as "*l. toxic chemicals 

1 5 such as vincristine, and toxic peptides such as diphtheria toxin. Other suiteible effector components 
include peptides or polynudeotidescapable of altering the phenotype of the cell in a desirable fashion: 
for example, installing a tumor suppressergene, or rendering them susceptible to immune attack. 

In most applications of antibody molecules in human therapy, it is preferable to use human 
monoclonals, or antibodies that have been humanized by techniques known in the art. This helps 

20 prevent the antibody molecules themsetvesfrom becoming a target of the hosf s immune system. 

An example of how polypeptides embodied In this Invention can be effectively used in 
treatment is through vaccination. The growth of canror cells is naturally limited in part due to immune 
surveillance. This refers to the recognltton of cancer cells by inrmnune recognition units, particularly 
antibodies and T cells, and the consequent triggering of immune effector functions that limit tumor 

25 progression. Stimulation of the Immune system using a particular tumor-specific antigen enhances 
the effect towards the tumor expressing the antigen. Thus, an active vaccine comprising a 
polypeptide encoded by the cDNA of this invention would be appropriately administered to subjects 
having overabundance of the con^sponding RNA. There may also be a prophylactic role for the 
vaccine in a population predisposed for developing cancer cells vwth overabundance of the same 

30 RNA. 

Ways of increasing the effectiveness of cancer vaccines are known in the art (Beardsley. 
MacLean et al.). For example, synthetic antigens are conjugated to a carrier like keyhole limpet 
hemocyanin (KLH), and then combined wim an adjuvant such as DETOX^, a mixture of 
mycobacterial cell walls and Upld A. Any polypeptMe encc>ded by the four novel genes described In 
35 this inventk>n can be used In analogous compositions. 

Methods for preparing and administering polypeptide vaccines are known In the art. Peptides 
may be capable of eliciting an Immune response on their own. or they may be rendered more 
immunogenic by chemical manipulatfon. such as cross-linking or attaching to a protein canier like 
KLH. Preferably, the vaccine also comprises an adjuvant, such as alum, muramyl dipeptides. 
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liposomes, or DETOX*^. The vaccine may optionally comprise auxiliary substances such as wetting 
agents, emulsifying agents, and organic or inorganic salts or acids.. It also comprises a 
pharmaceuticaliy acceptable excipient which is compatible with the active ingredient and appropriate 
for the route of administration. The desired dose for peptide vaccines is generally from 10 ^ to 1 mg, 
S with a broad effective latitude. The vaccine Is preferably administered first as a priming dose, and 
then again as a boosting dose, usually at least four weeks later. Further boosting doses may be given 
to enhance the effect. The dose and its timing are usually determined by the person responsible for 
the treatment 

10 Sequence data and d^ots/ts 

The foregoing detailed description provides* inter alia, a detailed explanation of how genes 
associated with cancer can be identified and their cDNA obtained. Polynucleotide sequences for 
CH1-9a1 1-2. CH8-2a13-1, CH13-2a12-1, and CH14-2a16-1 are provided. 

15 The sequence data listed in this application was obtained by two-directional sequencing, 

except where indicated otherwise. The data are k)elieved to be accurate — nevertheless, it is readily 
appreciated that the techniques of the art as used herein have the potential of Introducing occasional 
and infrequent sequence errors. Clones and inserts obtained via PGR may also comprise occasional 
errors introduced during amplification. Nucleotide sequences predicted from database compilations. 

20 and sequence data obtained by one-directional sequencing may also contain occastonat errors in 
accordance with the limitations of the underiying techrriques. In addition, allelic variations to both 
nucleotide and amino acid sequences may occur naturally or be deliberately induced. Differences of 
any of these types between the sequences provided herein and the invention as practiced may be 
presentwithout departing firom the spirit of the invention. 

25 Sequence data for CH8-2a13-1 and CH13-2a12-1 cDNA are believed to comprise the entire 

translated coding sequence, and 5' and 3* untranslated regions corresponding to those found in 
typical mRNA transcripts. Multiple mRNA transcripts may be found depending on the patterns of 
transcript processing in various cell types of interest. Sequence data for CH1-9a11-2 and 
CH14-2a16-1 cDNA comprise a portion of the coding sequence and 3' untranslated regions. 

30 Additional sequence is typically present in the corresponding mRNA transcripts, comprising an 
additional coding region in the N-terminal direction of the protein, and possibly a 5' untranslated 
regton. 

Certain embodiments of this invention may be practiced by polynucleotide synthesis 
according to the data provided herein, by rescuing an appropriate insert corresponding to the gene of 
35 interest from one of the deposits listed below, or by isolating a corresponding polynudeottde from a 
suitat>ie tissue source. Various useful probes and primers for use in polynucleotide isolation are 
provided herein, or may be designed from the sequence data. 
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Three deposits have been made on May 31 . 1996 with the American Type Culture Collection 
(ATCC), 12301 Parklawn Drive, Rockville, Maryland 20852 under temns of the Budapest treaty. The 
deposits are outlined in Table 2: 





TABLE 2: ATCC Deposits 


n 


BC6F1 

Accession No. 


Mixture of E, co// with recombinant plasmids of cDNA fragments of genes 
associated with breast cancer. The 8 recombinant plasmids may be separated 
by plating on Amplciltin plates and selecting single colonies for analysis by PGR 
using SP6 and T7 primers. 




Gene 


Subclone 


Expected size of PGR product 




CH1-9a11-2 


pch1-1.1 


1.1 kb 






pch1.2.5 


2.5 kb 




CH8-2a13-1 


pch8-600 


600 bp 






pch8-3k 


3.0 kb 






pch8-4k 


4,0 kb 




CH14-2a16-1 


pchi 4-800 


800 kb 






pch14-1.6 


1.6 kb 






pch14-1.3 


1.3 kb 


BCGF2 

Accession No. 
97595 


MixtureofXgtIO recombinant phages with cDNA inserts of genes associated 
with breast cancer. The 2 phages may be separated by growing in the co// 
host (strain NM514) and plating out for single plaques. These plaques can be 
distinguished by PGR using Xgt1 0 reverse and fbnward primers. 




Gene 


Phage 


Expected size of PGR product 




CH13-2a12-1 


Xch13-3.5 


3.5 kb 




CH14-2a16-1 


Xch14-2.5 


2.5 kb 


XBCBT474 

Accession No. 
97594 


cDNA library derived from breast cancer cell line BT474 in XgtIO vector, 
supplemented with a cDNA library from breast cancer ceU line 600PE in Vgtl 0 
vector. The cDNA Insert sizes range from about 0.5 to 5 kb. 
XBCBT474 is a source of additional cDNA inserts corresponding to 
CH1-9a1 1-2, CH8-2a13-1 , CH13-2a12-1,or CH14-2a16-1 not present in 
BCGF-1 orBCGF.2. 



5 



Sequence databases contain sequences of polynucleotide and polypeptide fragments with 
varyous degrees of identity and overtap with certain embodiments of this invention. The fbilowing list 
of accession numbers is provWed for the Interest of the reader; It is not Intended to be comprehensive 
or a limitation on the invention. The database disclosures do not typically indicate use in cancer 
10 diagnosis, drug development, or disease treatment. 

The following GenBank accession numt)ers are listed In relation to CH1-9a11*2: dbEST 
N32686; N45113; N36176: N22982; AA278830; H88670; AA235936; AA236951; H26301; N28026; 
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H88063; H88064; D61948: H88718; H26460; AA137920; AA145308; W12952: AA200687; N44164; 
T27279: dbSTS G22044; G04961 . 

The folbwing GenBank accession numbers are listed m relation to CH6-2a13-1: dbNR 

083760 

5 The following GenBank accession numbers are listed in relation to CH13-2a12-1: dbNR 

U58090; dbEST AA182441; AA253924; AA179755; AA112715: AA1 12640; W67977; AA160317; 
WB8080: AA150243; AA100446; W69636; H46574; AA245889; AA100651: H77368; AA192778; 
T85671; N32682; T86257; T78239; T77874: AA187865; Z33557; R40816; N99802; R19302; 
AA100650; N55904; AA257151; H77369: T79014. 
10 The following GenBank accession numbers are listed In relation to CH14-2a16-1 : dbEST 

N64802; \Affi6903; N31400: W95674; AA233551; AA233636; N24105; VV03447; W25821 ; AA233666; 
AA233647; N67843; D55778; T66839; N55370; N75650; AA280736; H97110; 219643; H91250; 
AA230765; R930d9; T84665: VV94857; R92873 

15 The examples presented below are provkled as a further guide to a practitk)ner of ordinary 

skill in the art and are not meant to fc>e limiting in any way. 

EXAMPLES 

20 Example 1: SsiectingeDNA formessBngerRNA thatis overabundantin breastcancareaiis 

Total RNA was isolated from each breast cancer ceil line or control cell by centrifugatlon 

through a gradient of guanidine isothiocyanate/CsCi. The RNA was treated with RNase-free DNase 

(Promega. Madison, Wl). After extraction with phenol-chlorofbrm, the RMA preparations were stored 
25 at -70^C. OligoKfT poly nucleotMes for priming at the 3' end of messenger RNA witti the sequence 

T^tNM (where N € {A,C.G} and M e {A»C,G.T}) were synthesized according to standard protocols. 

Arbitrary decamer potynucleotkles (OPA01 to OPA20) for priming towards the 5' end were purchased 

from Operon Biotechnok)gy. inc., Alameda. CA. 

The RNA was reverse-transcrlt)ed using AMV reverse transcriptase (obtained from BRL) and 
30 an anchored oligo-dT primer in a volume of 20 ^L, according to the manufacturer's directbns. The 

reaction was Incubated at 370C for 60 min and stopped by Incubating at 950G for 5 min. The cDNA 

obtained was used immediately or stored frozen at -70^C. 

Differential display was conducted according to the following procedure: 1 cDNA was 

replicated in a total volume of 10 |iL PGR mixture containing the appropriate T^^NM sequence, 0.5 TM 
35 of a decamer primer, 200 Tli* dNTP, 5 TCI I^S]-dATP (Amersham), Taq polymerase buffer with 2.5 

mM MgClb and 0.3 unit Taq polymerase (Promega). Forty cycles were conducted in the following 

sequence: 94^0 for 30 sec. 40°C for 2 min, 7^0 for 30 sec; and then the sample was Incubated at 
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72°C for 5 mfn. The repficated cONA was separated on a 6% polyacrylannide sequencing gel. After 
electrophoresis, the gel was dried and exposed to X-ray film. 

The autoradiogram was analyzed for \abe\eti cONA that was present in larger relative amount 
in all of the lanes corresponding to breast cancer cells, compared with alt of the lanes corresponding 
5 to control cells. Figure 1 provides an example of an autoradiogram from such an experiment. Lane 
1 is from non-proliferating normal breast cells; lane 2 is from proliferating normal breast cells; lanes 
3 to 5 are from breast cancer cell lines BT474. SKBR3. and MCF7. The left and right side shows 
the pattern obtained from experiments using the same T^^NM sequence (TnAC)» but two different 
decamer primers. The arrows indicate the cDNA fragments that were more abundant in all three 

1 0 tumor lines compared with controls. 

The assay illustrated in Figure 1 was conducted using different combinations of oligo-dT 
primers and decamer primers. A number of difforentially expressed bands were detected when 
different primer combinations were used. However, not all differences seen initially were 
reproducible after re-screening. We therefore routinely repeated each differential display for each 

15 primer combination. Only bands showing RNA overabundance in at least 2 experiments were 
selected for further analysis. 

It is preferable to include in the differential display experiment RNA derived from uncultured 
normal mammary epithelial cells (termed "organoids"). These cells are obtained from surgical 
samples resected from healthy breast tissue, which are then coaxed apart by blunt dissection 

20 techniques and mild enzyme treatment Using organoids as ttie negative contiDi, 33 cDNA 
fragments were isolated from 15 displays. 

Exampfe 2: Suthseleeting cDNA that ctmsponds to genes fftaf are duplicatea In bfBast 
cancer ce//s 

cONA fragments ttiat were difforentially expressed in the fiashion described in Example 1 
were excised from tiie dried gel and extracted by boiling at g50C for 10 min. Eluted cDNA was 
recovered by ethanol precipitation, and replicated by PGR. The product was cloned into ttie pCRII 
vector using ttie TA cloning system (Invitrogen). 

30 EcoRI digested placenta DNA, and EcoRI digested DNA from tiie breast cancer cell lines 

BT474. SKBR3 and ZR-75-30 were used to prepare Souttiem blots to screen the cloned cDNA 
fragments. The cloned cDNA fragments were labeled wiUi I32Pl-dCTP, and used individually to probe 
the blots. A larger relative amount of binding of the probe to the lanes oonresponding to the cancer 
cell DNA Indicated tiiat the corresponding gene had been duplicated in ttie cancer cells. The labeled 

35 cDNA probes were also used in Norttiern Mots to verify ttiat ttie conesponding RNA was 
overabundantin ttie appropriatecell lines. 

To detennlne whettier ttie cDNA fragments obtained by ttiis selection procedure 
corresponded to novel genes, a partial nucleotide sequence was obtained using Ml 3 primers. 
Each sequence was compared with the known sequences in GenBank. In initial experiments. 5 of 
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the first 7 genes sequenced were mitochondrial genes. To avoid repeated isolation of 
mitochondrial genes. sut>sequent screening experiments were done with additional lanes in the 
ONA blot analysis for EcoRI digested and HinMl digested mitochondrial DNA. Any cDNA fragment 
that hybridized to the appropriate mitochondrial restriction fragments was suspected of 
5 corresponding to a mitochondrial gene, and not analyzed further. 

From the 33 cDNA fragments detected from differential displays using organoid mRNA. 12 
were subcloned. Of these 12, 6 detected suitable gene duplications in the appropriate cell lines. 
Three cDNA failed to detect duplicated genes, and 3 appeared to con^spond to mitochondrial 
genes. Sequence analysis of the 6 suitable cDNA fragments showed no identity to any known 
10 genes. 

To obtain longer cDNA corresponding to the cDNA fragnnents with novel sequences, the 
firagments were used as probes to screen a cDNA library from breast cancer cell line BT474. 
constructed in lambda GT10. The longer cDNA obtained from lambda GT10 were sequenced 
using lambda GT10 primers. The chromosomal locattons of the cDNAs were determined using 
1 5 panels of somatic cell hybrids. 

Four of the 6 novel cDNA identified so far have been processed in this fashion. The 
probes used to obtain the 4 new breast cancer genes are shown in Table 3. 



■ 

1 TABLES: Primers used for Differentia! Display 


cDNA 


Ollgo-dT primer 


Arbitrary primer 


CH1-9a11-2 


TiiCC (SEQIDNO:9) 


SEQID NO: 11 


CH8-2a13-1 


TiiAC (SEQIDNO:10) 


SEQ ID NO: 12 


CH13-2a12-1 


Ti|AC (SEQIDNO:10) 


SEQ ID NO:13 


CH14-2a16-1 


T^rAO (SEQIDNO:10) 


SEQ ID NO:14 



20 

Exampie 3: Using the cDNA to test panels of breast cancer cells 

To determine the proportion of breast cancers in which the putative breast cancer genes 
were duplicated, or showed RNA overabundance without gene duplication, the four cDNA obtained 
25 according to the selection procedures described were used to probe a panel of breast cancer cell 
lines and primary tumors. 

Gene duplication was detected either by Southern analysis or slot-blot analysis. For 
Southern analysis, 10 |ig of EcdR\ digested genomic DNA from different cell lines was 
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electrophoresedon 0.8% agarose and transfennedto a HYBOND™ N-f membrane (Amersham). The 
filters were hybridized with 32P-labeled cDNA for the putative breast cancer gene. After an 
autoradiogram was obtained, the probe was stripped and the blot was re-probed using a reference 
probe to adjust for differences in sample loading. Either chromosome 2 probe D2S5 or chromosome 
6 21 probe D21S6 was used as a reference. Densities of the signals on the autoradiog rams were 
obtained using a densitometer (Molecular Dynamics). The density ratio between the breast cancer 
gene and the reference gene was calculated for each sample. Two samples of placental DNA digests 
were run in each Southern analysis as a control. 

For stot-blot analysis, 1 ^g of genomic DNA was denatured and slotted on the HYBOND^ 

10 mennbrane. D21S5 or human repetitive sequences were used as reference probes for slot blots. The 
density ratio between the breast cancer gene and the reference gene was calculated for each sample. 
10-15 samples of placental DNA digests were used as control. Amongst the control samples, the 
highest density ratio was set at 1.0. The density ratio of the tumor cell lines were standardized 
accordingly. An arbitrary cut-off for the standardized ratio (typically 1.3) was defined to identify 

15 samples in which the putative gene had been duplicated. Each of the cell lines in the breast cancer 
panel was scored positively or negatively for duplication of the gene being tested. 

Some of the cell lines in the panel were known to have dupTicated chromosomal regions from 
comparative genomic hybridization analysis. In instances where the cDNA being used as probe 
mapped to the known amplified region, the cDNA indk:ated that the corresponding gene had also 

20 been duplicated. However. duplk:ated genes were also detected using each of the four cDNAs tn 
instances where comparative genomic hybridization had not revealed any amplification. 

Because of the nature of the technique, the standardized ratio calculated as described 
underestimates the gene copy number, although it is expected to rank in the same order. For 
example, the standardized ratk> obtained for the c-myc gene in the SKBR3 breast cancer cell viras 5.0. 

25 However, it is known that SKBR3 has approximately 50 copies of the omyc gene. 

To test for overabundance of RNA. 1 0 |ig of total RNA from breast cancer cell lines or primary 
breast cancer tumors were electrophoresed on 0.8% agarose in the presence of the denaturant 
formamide, and then transferred to a nylon membrane. The membrane was probed first with 
32P-labeled cDN A corresponding to the putative breast cancer gene, then stripped and reprobed with 

30 32P-lat>eled cDNA for the beta^ctln gene to adjust for differences in sample loading. Ratios of 
densities between the candidate gene and the beta-actin gene were calculated. RNA from three 
different cultured normal epithelial cells were included in the analysis as a control for the normal level 
of gene expression. The highest ratio obtained from the normal eel samples was set at 1 .0, and the 
ratk>s in the various tunrtor cells were standardized accordingly. 
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ExamplB4: Chromosome 1 genBCH1^9a1 1^2 

One of the cDNA obtained through the selection procedures of Examples 1 and 2 
6 corresponded to a gene that mapped to Chromosome 1 . 

Table 4 summarizes the results of the analysis for gene duplication and RNA overabundance. 
Both quantitative and qualitative assessment Is shown. The numbers shown were obtained by 
comparing the autoradiograph intensity of the hybridizing band in each sample with that of the 
controls. Several control samples were used for the gene duplication experiments, consisting of 
10 different preparations of placental DNA. The control sample with the highest level of intensity was 
used for standardizing the other values. Other sources used for this analysis were breast cancer cell 
lines with the designations shown. For reasons stated in Example 3, the quantitative number is not a 
direct indication of the gene copy number, although It is expected to rank in the same order. Similarly, 
up to 6 control samples v^fere used Ibr the RNA overabundance experiments, consisting of different 
15 preparations of breast cell organoids which had been maintained briefly in tissue culture until the 
experiment was perfbnmed. The control sample with the highest level of intensity was used for 
standardizing the other values. Each cell line was scored + or - according to an artjitrary cut-off value. 



-48- 



wo 97/38085 



PCT/US97/05930 



TABLE 4; Chromosome 1 GeneJn 
Breast Cancer Cell Lines 


Source 


CHI-dall-2 

Gene 
Duplication 




CH1*Sa11-2 
RNA Overabundance 












5^ 


4.4kb 




Normal 




I.UU 




I.UU 




1.0** 


D I4r4 




ii.fM 






+ 


3.7 










no 




nd 


MDA453 


+ 


2.0D 


+ 


5.79 


+ 


6.2 


MDA435 




3»72 




0.89 


+ 


2.4 


SKBR3 


+ 


1.86 




0.94 


+ 


2.9 


600PE 


+ 


1.72 


+ 


4.47 


+ 


6.8 


MDA157 




1.49 




1.08 




1.4 


MCF7 


+ 


1.95 




nd 




nd 


DU4475 


+ 


2.02 




1.13 


+ 


1.5 


MDA231 




1.23 


+ 


1.47 






BT20 




1.09 




0.83 


+ 


1.9 


T47D 




1.05 




nd 




nd 


UACC812 




0.67 




1.57 


+ 


1.8 


1 MDA134 




1.19 


+ 


5.04 


+ 


7.1 


1 CAiy^-1 




1.02 




2.51 




7.2 


Incidence 
(%) 


9/15 
(60%) 


7/12 
(58%) 


11/12 
(92%) 



Gene duplicatioii or RNA overabundanoe; • no duplication or overabundance; nd = not done 

* Degiee of gene duplication is reported relative to placental DMA preparations. 

** Degiee of RNA overabundance is reported relaUve to the highest level observed for 

several cuttures of normal epithefial cells. Two hybridizing species of RNA 

are calculated and reported separately. 



The gene conrespondoig to the CH1-9a1 1-2 cDNA was duplicated in 9 out of 15 (60%) of the 
breast cancer oell lines tested, compared with placental DMA d^ests (P3 and P12). The sequence of 
the 115 bases from the 5' end of the cDNA fragment (SEQ. ID NO:1) Is shown in Figure 22. There 
was no substantial homology to any known gene In GenBank. One of the three possible reading 
5 frames was found to be open, with the predicted amino add shown in Figure 22 (SEQ. ID NO:2). 



-49- 



15 



WO 97/38085 PCT/US97/05930 

The CH1-9a11-2 gene was further characterized by obtaining additional sequence 
Information. A X-GT10 cDNA library from the breast cancer cell Gne BT474 (Example 2) was 
soeened using the initial cDNA insert and a done with a 2.5 idlobase insert was identified. The 
Identified done was subdoned into plasmid vedor pCRII. T7 and Sp6 primers for regions flanking the 
5 cOl^^serts were used as initial sequendngprimns: 

T7 primer. (SEQ. ID NO:42) 

5'-TAATACGACTCACTATAG6GAGA-3' 
Sp6 primer (SEQ. ID NO:43) 

^0 5'-CATACGATTTAGGTGACACTATAG-3' 

Sequendng continued bjr walking along the region of interest by standard techniques, using 
sequendng primers based on data already obtained. Primers used in sequendng are designated 1- 
16 in Figure 7. 

A second done (designated pCH1-1.1) overiapping on the 5' end was obtained using 
CLONTECH Marathon"* cDNA Ampiifk:ation Kit A map showing the overtapping regions is provkled 
in Figure 6. Briefly, two DNA primers designated CHIa and CHIb (Rgure 7) were synthesized. 
Polyadenylated RNA from bteast cancer cell line 600PE was reverse transcribed using CHIb primer. 
After second strand synthesis, adaptor DNA provided in the kit was ligated to the double-stranded 
cDNA. The 5' end cDNA of CH1-9a11-2 was then amplified by PCR using primers CH1a and API 
(provkled m the kit). To increase the spedficity of the PCR products, the first PCR products were 
PCR reamplified using nested primers CHIa and AP2 (provkled In the kit). The PCR products were 
doned into pCRH vedor (Invitrogen) and screened with CH1-8a1 1-2 probe. 

The sequence of 3452 base pairs between the 5' end of pCH1-1.1 and the poly W tal of CHI- 
25 9a11-2 was detemtined by standard sequencing technk:|ues. The DNA sequence is shown in Figure 
8 (SEQ. ID Ivl0:l5). The longest open reading frame is In frame 1 (bases 1-1875), and codes for 624 
amino adds before the stop codon. The corresponding amino acid sequence of this frame is shown 
in the upper panel of Figure 9 (SEQ. ID NO:16). The partial sequence predteted for the translated 
protein is Hsted the tow panel of Figure 9 (SEQ. ID NO:17). Bases 1876 to the end of the sequence 
30 are believed to be a 3' untranslated region. A hydrophobicity analysis identified a putative membrane 
insertton or membrane spanning region at about amino adds 382-400. indicated in Figure 9 by 
underfining. 

Figure 23 is a listing of additional cDNA sequence obtained for CH1-9a11-2. comprising 
approximately 1934 base p^ 5' firom the sequence of Figure 8. The additional sequence data was 
obtained by rescuing and ampUfying two further fragments of CH1-9a11-2 cDNA Nested primers 
were designed -100 base pairs downstream finom ttte 5' end of the known sequence. The primers 
were used in a nested ampnficatfon assay using API and AP2. using the CLONTECH Marathon™ 
cDNA Amplification Kit as described above. The template for the first upstream frs^ment was 
reverse-transcribed polyadenylated RNA from breast cancer celt line 600PE . as described e^r. 



20 



35 
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This fragment was sequenoed, and another set of nested primers was designed. The template for the 
next upstream fragment was a Marathon^ ready cDNA preparation from human testes, also supplied 
by CLONTECH. 

The nucleotide sequence shown in Figure 23 comprises an open reading frame through to 
5 the 5' end. Figure 24 shows the con-esponding protein translation. Between about another 500-1000 
bases are predicted to be present in the CH1-9a11-2 direction, with the protein encoding sequence 
beginning somewhere within this additional sequence. Sequencing of the encoding region is 
completed by obtaining additional CH1-9a1 1-2 fragments in this direction. 

A GENINFO® BLAST search of nucleotide and peptide sequence databases was performed 
10 through the National Center for Biotechnology Infbnmation on February 23. 1996. Short segments of 
homology with other reported human sequenpes were found at the nucleotide level (<500 base pairs), 
but none with any ascribed function in the respective identifier. At the amino add level, no identity 
higher than 30% was found with any reported eukaryotic sequences. 

A CH1-9a11-2 cloned insert has been used to probe the level of relative expresston in 
15 poiyadenylated RNA from a panel of tissue sources. The RNA was obtained already prepared for 
Northern blot analysis (CLONTECH Catalog # 7759-1, 7760-1 and 7756-1.) The manufacturer 
produced the blots from approximately 2 ng of poly-A RNA per lane, run on a denaturing 
formaldehyde 1-2% agarose gel, transferred to a nylon membrane, and fixed by UV Irradiation. The 
relative CH1-9a1 1-2 expression observed at the RNA level is shown in Table 5: 

20 
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^ : ■ rrt 

TABLES: NortiiemMotanalysIs 




CHI-aall-2 mRNA | 


heart 


++ 1 


brain 


+ 


placenta 


++ 


lung 


+/- 


liver 


+/- 


skel^l muscle 


+ 


Iftlney 


+/- 


pancreas 


+++ 


spleen 


+ 


thymus 


+ 


prostate 




testis 


+++ 


ovary 




small intestine 


+ 


colon 




peripheral blood 


+/- 


++++ Vqryhigh 

High 
++ Medium 

+/— Verylow 



Relatively elevated levels of expression were tibsBtved In heart, placenta, pancreas, prostate, testis 
and ovaiy. The level of expressran m iweast cancer ceD ines is also relativeiy high (atjout ++++ on 
the scale), since the Northern analysis perfomied on these lines (described sAove) was conducted on 
5 totel cellular RNA, of which polyadenylated RNA constitutes only about 5%. It is lilwly that the CHI- 
Sal 1-2 gene is involved in a biological process that is typical to the tissue types showing medium to 
high levels of expression, which may relate to increased tissue growth or metabolism. 

Since the obtained sequence is shorter than the apparent size of mRNA observed in 
Northern analysts (Table 1), an additional polynucleotide segment is believed to be present at the 5' 

10 end of the sequence shown in SEQ. ID N0:15. Further sequence data at the 5' end is deduced by 
obtaining additional cloned cDNA usmg standard techniques. Briefly, in one approach, mRNA from 
breast cancer ceH lines MDAr453 and/or 600PE are dooed and screened using primers based on 
sequence data from SEQ. ID N0:15. Two nested primers of about 20 nucleotides are prepared, the 
innennost about 150 base pairs from the 5' end, and the outennost stmil 170 base pairs from the 5' 

1 5 end. The outermost primer is used to synthesize a first cDNA strand complementary to the mRNA in 
the upstream direction. Second strand synthesis is perfonned using res^ents in a CLONTECH 
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Marathon^ cDNA amplification kit according to manufacturer's directions. The double-stranded DNA 
is Vnen ligated at Uie 5' end of the coding sequence with the double-stranded adaptor fragment 
provided in the Icit A first PCR amplification (about 30 cycles) is performed using the first adapter 
primer from the Icit and the outennost RfslA-specific primer, and a second ampfification (about 3D 
5 cycles) is performed using the second adapter primer and the innermost RNA-spectfic primer. In an 
altemative approach, a CLONTECH RACE-READY single-stranded cDNA firom human placenta is 
PCR amplified using nested 5' anchor primers in combination with the outermost and innermost RNA- 
specific primers. Amplified DNA obtained using either approach is analyzed by gel electrophoresis, 
and cloned into piasmid vector pCRII. Clones are screened, as necessary, using the 2.5 kilobase 
10 CH1-9a11-2 insert. Clones corresponding to full-length mRNA (4.5 kb or 5.5 kb; Table 1), or cDNA 
fragments overlapping at the 5' end are selected for sequencing. Compared with the 4,5 kb fbmn, 
additk)nal polynucleotide segments may be present in the 5.5 kb form withm the encoding region, or in 
the 5' or 3' untranslated regkxi. 

15 ExampteS: ChromosomB 8 gene CH8'2a13'1 

One of the cDNA obtained corresponded to a gene that mapped to Chromosome 8. Figure 2 
shows the Southern tAoX analysis for the corresponding gene in various DNA digests. Lane 1 (PI 2) is 
the control preparation of placental DNA; the rest show DNA obtained from human breast cancer cell 
20 lines. Panel A shows the pattern obtained using the 32P-lat>eled CH8-2a13-1 cDNA probe. Panel B 
shows the pattern obtained with ttie same bbt using the 32P-labeled D2S6 probe as a toading control. 
The sizes of the restrictk>n fragments are indtoated on the right 

Figure 3 shows the Northern btot analysis for RNA overabundance. Lanes 1-3 show the level 
of expresston in cultured nomial epithelial cells. L^nes 4-19 show the level of expression in human 
25 breast cancer cell lines. Panel A shows the pattern obtained using the CH8-2a13-1 pmbe; panel B 
shows the pattern obtained with k)eta-actin cDNA. a bading control. 

The results are summarized in Table 6. The scoring method is the same as for Example 4. 
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TABLE 6: Qhromosomo 8 Genes 
In Breast Cancer Cell Unes 



Source 



CHB-2a13.1 
Gene Duplication 



CH8-2a13.1 
fiNA Overabundance 



c-myc 
Gene DupUcation 



Normal 

SKBR3 

ZR-75-30 

BT474 

MDA157 

MCF7 

CAMA-1 

MDA361 

MOA468 

T47D 

MDA453 

MDA134 

MDA435 

600PE 

UACC812 

MDA231 
DU4475 
BT468 

BT20 



Incidenca: 



+ 
+ 
+ 

+ 
+ 
+ 
+ 
nd 

+ 
+ 
+ 



1.00* 
4.25 
3.82 
1.53 
2.02 
1.84 
3.62 
2.00 



1.41 
1.83 
1.30 
2,15 
0.95 
1.25 

0.80 
0.85 
0.37 
0.95 



t2A17 
(71%) 



♦ 
nd 

+ 

+ 
+ 

+ 
+ 
+ 
+ 
+ 



1.00- 
4.30 

1.72 
3.39 
4.92 
2.14 
1.74 
4.50 

1.58 
3.10 
3.70 
4.94 
2.04 
2.40 

1.28 
0.88 
0.70 
0.82 



14/17 
(82%) 



+ 
+ 

+ 

+ 
nd 
nd 



1.00* 
4.73 
2.24 
1.76 
1.39 
3.10 
1.61 



1.02 
0.90 
0.88 
1.00 
0.54 
0.74 

1.27 
0.50 
0.23 



7/ie 



Gene duplicatkm or RNA everabundance: - no dupKcation or ovembundancae: nd « not done 
Degiee of gene dupfication is reported relative to plaoentat DNA preparations. 



10 



15 



The gene corresponding to CH8-2a13-1 showed dear evidence off duplication in 12 out of 17 
(71%) of the cells tested. RNA overabundance was observed in 14 out of 17 (82%). Thus, 1 1% of 
the ceils had achieved RNA overabundance by a mechanism other than gene duplication. 

Since the known oncogene omjfc is located on Chiomosome 8, the Southern analysis was 
also conducted using a probe far c-myc. At least 2 of the breast cancer cells showing duplication of 
the gene corresponding to CH8-2a13-1 gene did not show duplication of c-myc. This indicates that 
the gene corresponding to CH&-2a13-1 is not part of the myc ampilcon. 

The sequence of 150 bases from the 5' end of the cDNA fragment is shown in Figure 22 
(SEQ ID NO:3), There was no substantial honrwiogy to any known gene in GenBank. One of the 
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three possible reading frames was found to be open, wtth ttie amtno acid sequence shown in Figure 
22(SEQ IDNO:4). 

The CH8-2a13-1 gene was further characterized by obtaining additional sequence 
infomiatlon. A X-GT10 cDNA library from the breast cancer cell line BT474 (Example 2) was 
5 screened using the initial cDNA insert and clones with a 3.0 kb and a 4:0 kb insert were klentified. 
The two Identified clones were subctoned into plasmid vector pCRII. T7 and Sp6 primers for regions 
flanking the cDNA inserts were used as initial sequencing primers. Sequencing continued by walking 
along the region of interest by standard technk:|ues, using sequencing primers based on data already 
obtained. The two inserts were found to overlap (Figure 6). Primers used are those designated 1-25 
10 in Figure 10. 

A third ctone of about 600 bp (des^nated pCH8-600) overlapping on the 5' end (Figure 6) 
was obtained using CLOf4TECH Marathon^ cDNA Ampliftcatk»i Kit Briefly, two DNA primers CHSa 
and CHBb (Figure 10) were synthesized. Polyadenyiated RNA from breast cancer cell line BT474 
was reverse transcribed using CH8b primer. After second strand synthesis, adaptor DNA provided in 
15 the kit was Bgated to the double-stranded cDNA. The 5' end cDNA of CH8-2a1 3-1 was then amplified 
by PCR using primers CHSa and API (provkJed in the kit). To increase the spedffcity of the PGR 
products, the first PCR products were PCR reamplified using nested primers CHSa and Af^ 
(provided in the kit). The PCR products were cloned Into pCRII vector (Invitrogen) and screened with 
CH8-2a13-1 probe. 

20 By sequencing relevant portions of the three ctones. a nucleic aad sequence of 3982 base 

pairs between the 5' end and the poly-A tail of CH8-2a13-1 was detennined. The DNA sequence is 
shown in Figure 1 1 (SEQ. ID NO:18). Bases 1*152 are believed to be a 5' untransiated region. The 
tongest open reading frame is in frame 3 from base 153 to 3911, and codes for 1252 amino ackte 
before the stop codon. The corresponding amino ackl sequence of this frame is shown in the upper 

25 panel of Figure 12 (SEQ. ID NO:19). The sequence predkrted for the translated prot^ Is shown in 
the tower panel of Figure 12(SEQ. ID NO:20). 

A GENINFO® BLAST search of nucleotide and peptide sequence databases was perfonmed 
through the National Center for Btotechnotogy Informatton on March 26, 1996. The sequences were 
found to be about 99% identical at the nucleotide and amino acki level with bases 343-4103 of 

30 KIAA0196 protein (N. Nomura et al.. in press; sequence submitted to the DDBJ/EMBL/GenBank 
databases on March 4, 1996). The KIAA0196 was one of 200 different cDNA cloned at random from 
an immature male human myetobjlast cell line. KIAA0196 has no known btotogteal function, and is 
described by Nomura et al. as being ubquitousty expressed. 

A fourth ctone of about 600 bp overtapping pCHS^OO at the 5' end has also been obtained. 

35 Briefly, a DNA primer was syntliesized conresponding to about the first 20 nucleotkies at the 5' of the 
precfictod cDNA sequence, and used along with a primer based on the pCH8-600 sequence to 
reverse4Fanscribe RNA from breast icancer cell line BT474. The product was cloned into pCRII vector 
(Invitrogen) and screened with a CH8-2a13-1 probe. The new ctone Is sequenced atong both strands 
to obtain additional 5' untranslated sequence data for the cDNA The predicted compiled cDNA 
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nucleotide sequence of CH8-2a13.1 cDNA is shown in Rgure 13 {SEQ. ID N0:21). The 
corresponding amino acid sequence of thte frame is shown in Figure 14 (SEQ. ID NO:22). A 
polynucleotide comprising the compiled sequence is assembled by joining the insert of this fourth 
ctone to pCHWk within the shared regtan. Briefly, CH8-4k is cut with Xba\ and Afofl. The fourth 
5 done is cut with SamHI and Xtol. The figated polynucleotide is then inserted into pCRII cut with 
BamHl and A/ofl. 

A CH8-2a13-1 cloned insert has been used to probe the level of relative expres^n in 
polyadenylated RNA from a panel of tissue sources obtained from CLONTECH. as In Example 4. 
The relative CH8-2a13-12 expression obsen/ed at the mRNA level is shown in Table 7: 



1 TABLE 7: Nc^embtOt analysis 


Tissue 


CH1-9a11^ mRNA 


heart 


++ 


knain 


+ 


placenta 


+ 


lung 


+ 


liver 


+/- 


sl(eletal muscle 




kidney 


+/- 




— ^ — 


spleen 




thymus 


+ 


prostate 


+ 


testis 


++ 


ovary 


+ 


small intestine 




colon 


+ 


peripheral blood 




++++ VeryWgh 
+++ High / 
Medium 
+ 4jOW 
+/- Veiytow 



Relative levels of expresskm obsen/ed were as follows: Low levels of expression were observed in 
adult peripheral blood leukCKytes (PEL), brain, placenta, lung, liver, skeletal muscle, kidney, and 
pancreas. Medium levels of expression were obsen/ed in adult heart, spleen, thymus, prostate, testis. 
1 5 ovary, small intestine, and coton. High levels of expresston were observed in four fetal tissues tested: 
brain, lung, liver and kidney. The level of expresston in breast cancer ceil lines Is relatively high 
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(about on the scale), since the Northern analysis perfomned on these lines was conducted on 
tofa/ cellular RNA. It is likely that the CH8-2a13-1 gene is involved in a biological process that is 
typical to the tissue types showing niediunii to high levels of expression, which may relate to increased 
tissue growth or metat>olism. 

5 

Example 6: Chromosome 13 gene CHid^lall-i 

One of the cDNA obtained corresponded to a gene that mapped to Chromosome 13. Figure 
4 shows the Southern blot analysis for the corresponding gene in various DNA digests. Lanes 1 and 
10 2 are control preparations of placental DNA; the rest show DNA obtained from human breast cancer 
cell lines. Panel A shows the pattern obtained using the CH13-2a12-1 cDNA probe; panel B shows 
the pattern using D2S6 probe as a bading control. The sizes of the restriction fragments are 
Indicated on the right 

Figure 5 shows the Northern blot analysis for RNA overabundance of the CH13-2a12-1 gene. 

15 Lanes 1-3 show the level of expressbn in cultured normal epithelial cells. Lanes 4-19 show the level 
of expression In human breast cancer ceil lines. Panel A shows the pattern obtained using the 
CH13-2a12-1 prot>e; panel B shows the pattern otrtained with beta-actin cDNA, a loading control. The 
apparent size of the mRNA varied depending upon conditions of electrophoresis. Full-length mRNA is 
believed to occur at sizes of about 3.2 and 3.5 kb. 

20 The results of the RNA abundance comparison are summarized in Table 8. The scoring 

method is the same as for Example 4. 
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TABLE B: Chromosome 13 Gene 
in Breast Cancer Cell Lines 


Source 


CH13.2a12.1 
Qene duplication 


CH13-2a12-1 
RNA Overabundance: : ; : 


Nonnal 


- 


1.00* 




1.00** 


600PE 






+ 


5.57 


BT474 


+ 


1.60 




3.20 


SKBR3 


+ 


1.58 




4.25 


MDA157 


+ 


2.21 




3.76 


CAIwA-1 


+ 


1.41 


+ 


1.99 


MDA231 




1.65 




2.09 


T47D 


+ 


1.23 




1.20 


MDA468 


nd 






6.90 


MDA361 


nd 






2.59 


MDA435 




0.59 


+ 


3.41 


MDA134 




0.53 


+ 


2.59 


DU4475 




0,75 


+ 


1.79 


MDA453 




0.89 




1.97 


BT20 




0.37 




1.04 


MCF7 




0.29 




1.03 


UACC812 




0.30 




0.39 


BT468 




0.47 


nd 




ZR.75.30 




0.70 


nd 




Incidence 
(%) 


7m 
(44%) 


13/16 
<81%) 



* Qene duplicaKon or RNA overaburKtanos; - no duplication or overabundance; nd « not done 

Degree of gene duplication is leportad lelalive to plaoental DNA pfepar^^ 
** Degree of RNA oveiabundanoe is repotted reiattve to the hig^ level observed for several cultures 
5 of nonnal epithelial oeKs. 



The gene correspondtng to CH13-2a12-1 was duplicated in 7 out of 16 (44%) of the cells 
tested. Three of the positive cell lines (600PE, BT474, and MDA435) had been studied previously by 
comparative genomic hybridization, but had not shown amplified chromatin In the region where CH13- 
10 2A12-1 has been mapped in these studies. 

RNA overabundance was observed In 13 out of 16 (81%) of the cell lines tested. Thus, 37% 
of the celts had achieved RNA overabundance by a mechanism other than gene duplication. 
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Cells from primary breast tumors have also been analyzed them for duplication of the 
chromosome 13 gene. Ten of the 82 tumors analyzed (12%) were positive, confinning that 
duplication of this gene is not an artifact of in vitro culture. 

The sequence of 107 bases from the 5' end of the 1.5 kb cDNA fragment is shown in Figure 
5 22 (SEQ ID NO:5). There was no substantial homology to any known gene In GenBank. One of the 
three possible reading frames was found to be open, with the predicted amino acid sequence shown 
in Figure 22 (SEQ ID N0:6). 

The CH13-2a12-1 gene was further characterized by obtaining additional sequence 
information. A A.-GT10 cDNA library from the breast cancer cell line BT474 (Example 2) was 

10 screened using the initial cDNA insert, and clones with a 3.5 kjlot)ase and a 1.6 kilobase insert were 
identified. The two kientified ck>nes were subck>ned into plasmki vector pCRII. T7 and Sp6 primers 
for regions flanking the cDNA inserts were used as initial sequencing primers. Sequericing continued 
by walking along the regbn of interest tyy standard techniques, using sequencing primers based on 
data already obtained. The two niserts were found to overtap (Figure 6). Primers used during 

15 sequencing are shown in Figure 15. 

By sequencing relevant porttons of the 3.5 and 1.6 kb clones, a nucleto acid sequence of 
3339 base pairs between the 5* end and the poly-A tail of CH13-2a12-1 was determined. The DNA 
sequence is shown in Figure 16 (SEQ. ID NO:23). Bases 1-520 are believed to be a 5' untranslated 
regbn. The longest open reading firame is in frame 2 from base 521 to 1838. and codes for 611 

20 amino acids t^fore the stop codon. The corresponding amino ackl sequence of this frame is shown 
in the upper panel of Figure 17 (SEQ. ID NO:24). The sequence predicted for the translated protein is 
shown in the bwer panel of Figure 17 (SEQ. ID NO:25). Bases 1838 to 3339 of the nucleotide 
sequence are believed to be a 3' untranslated region, which is present in the 3.5 kb insert The 3.5 kb 
insert appears to be a splice variant (Figure 6), in whk:h the 3' untranslated region consists of bases 

25 1838-2797 in the sequence. 

A GENINFO® BLAST search of nucleotide and peptKle sequence databases was performed 
through the National Center for Btotechndogy Information on March 26. 1996. Short segments of 
honrotogy with other reported human sequences were found at the nucleotkJe level (<500 base pairs), 
but none with any ascribed function in the respective Wentifier. At the amino acid level, the sequence 

30 was found to share 33% identities and 54% positives with 228 residues of the lin 19 protein of 
Caenorhabditis elegans. This protein has been implicated in regulating the cell cycle of C elegans 
(ET Kiprecs, W He & EM Hedgecock). The CH13-2d12-1 gene is suspected of a role in controlling 
cell prdiforatton. "Controlling cell proUfeFatbn" in this context means that an abnormaHy h^h or tow 
level of gene expresston at the RNA or protein level results in a higher or tower rate of cell 

35 proliferation, or vtoe versa, compared with cells with an otherwise similar phenotype. There is also a 
tow-level homotogy between CH13-2a12-1 and VACM-1, a vasopressin-activated, calcium-mobilizing 
receptor firom rabbit kkiney medulla (Bumatofwska-Hledin et al). VACM-1 has a transmembrane 
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sequence, whereas none has been detected in CH13-2a12-1. Nevertheless, it is possible that the 
CH13-2a12-1 protein prxluct has a Ca** binding or Ca** mobilizing function. 

A CH13-2a12-1 cloned insert has been used to probe the level of relative expression in 
polyadenylated RNA ftom a panel of tissue sources obtained from CLONTECH. as in Example 4. 
5 The relative CH1 3-2a12-1 e)q3ression observed at the mRNA level is shown in Table 9: 



TABLE 9: Nofthemblotanalyste 1 


Tissue 


CH13*2a12-1 mRNA 


heart 


++++ 


brain 




placenta 


++ 


lung 


+ 


liver 




skeletal musde 


++++ 


kidney 


+ 


pancreas 




spleen 




thymus 


++ 


prostate 


++ 


testis 


+++ 


ovary 


++ 


small intestine 


++ 


colon 


+ 


peripheral bkxxJ 


+ 


Veryhl0h 
+++ High 
++ Medium 
+ tow 
•+/— Veryk>w 



Relatively elevated levels of expresston were observed in heart, skeletal musde and testis. 
The level of expression in breast cancer cell lines is relatively high (about ++++ on the scale), since 
10 the Northern analysis perfonned on these lines was conducted on total cellular RNA. It Is likely that 
the CH13-2a12-1 gene is involved in a bk)logk:al process that is typical to the tissue types showing 
medium to high levels of expresston. which may relate to increased tissue growth or metabolism. 

Fragments con^sponding to the CH13-2a12-1 gene have also been used to screen cell lines 
derived from other types of cancer. Southern analysis showed that about 1 out of 4 breast cancer cell 
1 5 lines tested have gene duplicatton of CH13-2a12-1 . Northern analysis showed that about 3 out of 6 
lines tested have overexpresston of the oorresponding RNA transcript. 
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Example 7: Chromosome 14 gene CH14*2a16'1 

One of the cDNA obtained corresponded to a gene that mapped to Chromosome 14. Results 
5 of the analysis are summarized in Table 10. The scxiring method is the same as for Example 4. 



1 

TABI^IO: ChfDmo8onne14Gene 
in Bteiat Cancer Cell Uoes 




CH14*2a16.1 


CH14«*2a1!i^1 


Source 


Gene duplication 


HNA Overabundance 


Nonnai 




1.00* 




1.00** 


BT474 


+ 


2.89 


+ 


2.57 


MCF7 


+ 


1.35 


+ 


1.88 


SKBR3 


+ 


2.58 


+ 


2.19 


T47D 


+ 


Z28 


nd 




MDA157 


+ 


1.52 


+ 


2.52 


1 UACC812 


+ 


2.23 


nd 




MDA361 




0.97 


+ 


1.43 


1 MDA453 




1.58 


+ 


5.92 


BT20 








1.07 


600PE 




0.94 


+ 


2.00 


MDA231 


+ 


1.66 


+ 


2.19 


CAMA-I 




0.92 




0.71 


DU4475 




0.87 


+ 


1.33 


BT468 




0.46 


nd 




MDA134 




0.77 


+ 


7.17 


Incidence 
(%) 


MB 
(53%) 


10/12 
(83%) 



* Gene duplication or overabimdanoe; - no duplication or overatHmdance; nd « not done 

* Degiae of gene duplication is reported relath^ to placental DNApie^ 

** Degree of RNA overabundance te reported relative to the highest level observed for several cultures 
of normal epithelial oeBs. 



The gene corresponding to CH14-2a16-1 was duplicated in 8 out of 15 (53%) of the cells 
testel. The sequence of 114 bases from the 5' end of the cDNA fragment is shown in Figure 22 
15 (SEQ ID NO:7). There was no sut)stantial honnology to any Icnown gene in GenBank. One of the 
three possible reading frames was found to be open, with the predicted amino add sequence shown 
in Figure 22 (SEQ ID NO:8). 
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The CH14-2a16.1 gene was further characterized by obtaining additional sequence 
infomnation. A X^TIO cDNA Hbrary from the breast cancer cell line BT474 (Example 2) was 
screened using the initial cDNA insert, and two clones were identified: one with a 1.6 kb insert, and 
the other with a 2.5 kb insert. The identified ctones were subctoned into plasmkl vector pCRII. The 
5 1.6 kb Insert was sequenced by using T7 and Sp6 primers fbr regrons flanking the cDNA inserts as 
initial sequencing primers. Sequencing continued by walking along the region of Interest by standard 
techniques, using sequencing primers based on date already obtained. Primers used are those 
designated 1-11 in Figure 18. 

A third clone (designated pCH14^00) overtapping on the 6' end (Figure 6) was obtained 

10 using CLONTECH Marathon^ cDNA Amplification Kit Briefly, DNA primers CH14a. CH14b. CH14c 
and CH14d (Figure 18) were prepared. Polyadenylated RNA from breast cancer ceil line MDA463 
was reverse transcribed using 14b primer. After second strand synthesis, adaptor DNA provided in 
the kit was ligated to the doubte-stranded cDNA. The 5' end cDNA of CH14.2a16-1 was then 
amprmed by PCR using primers CH14b (or CH14c) and API (provided in the kit). To increase the 

15 spedfk:ily of the PCR producte. the first PCR producte were PCR reamplified using nested primers 
CH14a (or CH14d) and AP2 (provkted in the kit). The PCR producte were cloned into pCRII vector 
(Invitrogen) and screened with CH14-2a16-1 probe. 

By sequencing pCH14-1.6 and pCH14-800. a nucleic acid sequence of 2021 base pairs 
between the 5' end and the poly-A tail of CH14-2a16-1 has been determined. The DNA sequence is 

20 shown in Figure 19 (SEQ. ID NO:26). The longest open reading frame is in frame 1 from base 1 to 
792. and codes for 263 amino ackJs before the stop codon. The corresponding amino acid sequence 
of this frame is shown in the upper panel of Figure 20 (SEQ. ID NO:27). The partial sequence 
predtoted fbr the translated protein te shown in the tower panel of Figure 20 (SEQ. ID NO:28). The 2.1 
kb ctone has not been sequenced, but is believed to consist about the same regton of the 

25 CH14-2a16-1 cDNA as pCH14-1 .6 and pCH14-800 combined. 

A GENINFO® BLAST search of nudeotkie and peptkte sequence datebases was performed 
through the National Center for Biotechnotogy Infonnation on March 26. 1996. Short segmente of 
homology with other reported human sequences were found at the nucleotkle level (<500 base paiis), 
but none with any ascribed function in the respective identifier. At the amino add level, the sequence 

30 was found to share homologies within the first 106 residues with an RNA binding protein from 
SatxOmnmyces cemvisiae with the designation NAB2. NAB2 is one of the nnajor proteins associated 
with nudear polyadenylated RNA in yeast oeUs, as detected by UV lightH'nduced cross-linking and 
Immunofluorescence. NAB2 is stron£^ and spedftoally associated with nuclear poly(A)+ RNA In vivo. 
Gene knock-out experimente have shown that this protein is essential to yeast cell survival 

35 (Anderson et al.). Accordingly, the protein encoded tv CH14-2a16-1 is suspected of having DNA or 
RNA binding activity. 

A fourth clone (pCH14-1.3) has been obteined that overtaps the pCH14-800 done at the 5' 
end (Figure 6). The method of isolation was similar to that fbr pCH14-800. using primers based on 
the PCH14-800 sequence. Partial sequence date for pCH14-1.3 has been obteined by one- 
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directional sequencing from the 5' and 3' ends of the pCH14-1.3 clone. Figure 21 shows the 
nucleotide sequence of the sequence of the 5' end (SEQ. ID NO:29) and the amino acid translation of 
the likely open reading frame (SEQ. ID NO:30); the nucleotide sequence of the 3' end (SEQ. ID 
NO:31) and the likely open reading frame (SEQ. ID NO:32). This data Is confinr^ and additional 
5 sequence between SEQ. ID NOS.29 and 31 is obtained fc>y fully sequencing tx>th strands of pCH14- 
1.3. Once compiled, the sequence data from pCH14-1.3, pCH14-800 and pCH14-1.6 may be shorter 
than the apparent size of mRNA ot>served in Northem analysis (Table 1). If necessary, further 
sequence data at the 5' end is deduced by obtaining addittorial cloned cDNA according to approaches 
described in this Example or Example 4. 

10 Figure 25 is a listing of additional cDNA sequence obtained for CH14-2a16-1. comprising 

approximatety 1934 base pairs 5' fix>m the sequence of Figure 19. The corresponding amino acid 
translation is shown in the upper panel of Figure 26. The additktnat sequence data was ok>tained by 
rescuing and amplifying further fragments of CH14-2a16-1 cDNA. Nested primers were designed 
-^100 base pairs downstream from the 5' end of the known sequence. The primers were used in a 

15 nested amplificatbn assay using API and AP2, using the CLO^f^ECH Marathon'^ cDNA 
Amplification Kit as described above. The template was a Marathon^ ready cDNA preparation from 
human testes, also supplied by CLONTECH. 

The nucleotide sequence shown in Figure 25 is closed at the the 5' end. The lower panel of 
Figure 26 shows what is predicted to be the sequence of the gene product, beginning at the first 

20 methtonine residue. The nucleotide sequence shown contains a point difference at the position 
indicated by the undertining in FQure 25. A base detenrraned to be A from the previously obtained 
polynucleotide fragment was a G in the one used in this part of the experiment This conesponds to a 
change from E (glutamic add) to G (glycine) in the protein sequence, at the position underiined in 
Figure 26. This may represent a natural allelic variation. 

25 A CH14-2a16-1 ctoned insert has been used to probe the level of relative expresston in 

polyadenylated RNA from a panel of tissue sources obtained from CLONTECH, as in Example 4. 
The relative CH14-2a16-1 expression observed at the mRNA level is shown In Table 1 1 : 
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TABLE11: Notthem blot analysis i 


:-. .-TISSIIB' 


CH14-2a16>1 mRNA 


heart 


+ 


DfBin 


? 1 


placenta 


+ 


lung 


+ 


liver 


+ 


skeletal muscle 


+ 


II kidney 




pancreas 




spleen 


+ 


thymus 


+ 


prostate 


+ 


testis 


++++ 


ovary 


+ 


small intestine 


+ 


colon 


+ 


peripheral blood 


+/ 


\ ++++ Very high 

+++ High 

1 Medium 

Low 

1 Veiytow 



CH14-2a16-1 mRNA was particularly high in testis. The level of expression in breast cancer 
cell lines is also quite high, since the Northern analysis performed on these lines was conducted on 
5 totai cellular RNA. It is likely that the CH14-2a16-1 gene is involved in a biofogical process that is 
typteal to the tissue types showing medium to high levels of expresston, whteh may relate to increased 
tissue growth or metabolism. 

Five motifs corresponding to a zinc finger protein have been found in the CH14-2a16-1 
nucleotide sequence. Further zino finger motifs may be present in CH14*2a16-1 in the upstream 
10 direction. Zinc finger motifs are present, for example, in RNA polymerases I. II. and III from & 
cemvisiae, and are related to the zinc knuckle fomily of RNA/ssDNA-binding proteins found In the HIV 
nucleocapsid protein. The actual sequence obsen/ed in each of the five zinc finger motifs of 
CH14-2a16-1 Is: 

15 £ys-KXaa)5-£ys-(Xaa)4-£yMXaa)HUiS or (SEQ. ID NO:36) 

CyMXaa)5-£]ffi-KXaa)5-£ys-(Xaa)s-tiis (SEQ. ID NO:39) 
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which is indicated in Figure 20 by underlining. This is identical to the 7 zinc finger nru>tifs of NAB2, 
which make up an RNA/ssDNA binding region (Anderson et aL). Accordingly, the CH14-2a16-1 gene 
5 product is suspected of having DNA or RNA binding activity, and may be specific for polyadenylated 
RNA. It may very well play a role in the regulation of gene replication, transcription, the processing of 
hnRNA into mature mRNA, the export of mRNA from the nucleus to the cytoplasm, or translation into 
protein. This role in turn nriay be closely implicated in cell growth or proliferation, particularly as 
manifest in tumor cells. 

10 

Examples: idenWicatlonafoUiercancar'associBiMgm» 

cDNA fragments corresponding to additional cancer-associated genes are obtained by 
applying the techniques of Examptes 1 8t 2 with appropriate adaptations. As before, caiicer cells 
15 are selected for use in differential display of RNA, based on whether they share a duplicated 
chromosomal region according to Table 12: 



TABLE 12: Cancer cell Jlnes sharing duplicated chromosomal regions 


: . iGhromosomal: :x 
1 location 


__ ^^_Z : J 


1 1p22-32 


smaH cell (Levin 1 994) 


1p22 


bladder (ICalHoniem1 1995) 


1p32-33 


rabdomyosarcoma (Steilen-Gimbel); breast (Ried 1995); 
small cell lung (Ried 1994) 


1q21-22 


sarcoma (Forus 1995a & b); breast (Muleris 1994a) 


1q24 


small cell (Levin 1994) 


IqSI 


bladder (Kallioniemi 1995) 


1q32 


glioma (IMuleris 1994b: Schrodc) 


1q 


head and necic (Speicher 1995), breast (Muleris 1994a) 


2p23 


small cell lung (Ried 1994) 


2(^4-25 


small cell lung (Levin 1994) 


2 


head and neck (Speicher 1995) 


2q 


head and neck (Speicher 1995) 


2q33-36 


head and neck (Speicher 1995) 


3p22*24 


bladder (Voorter). small cell (Levin 1994) 


3q24-26 


t)ladder (iCalBoniemi 1995), glioma (Kim), osteosarcoma (Tarkkanen) 


3q25-26 


ovarian (Iwabuchi) 
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— 1 

TABLE 12: Cancer cell lines sfiaring duplicated chromosomal reglona 


i Chromosomal : - 
location 


Caneer^pe & references ^ 


3q26-lenTi 


head and nedc (Spetcher 1995) 


oq 


small cell lung (Levin mw, Kenu log'i/, neau ano necK (opeicner iSvo) 


4<|1Z 


giionia (ocnrocK) 


5p 


small cell lung (Levin 1994 & 1995; Ried 1994) 


5p16.1 


glioma (Mulens 1994b) | 


6p 


osteosarcoma (Foms 1995a); breast (Ried 1995) 


6p21-temi 


melanoma (Spelcher) 


7p 


glioma (Schliegel 1994 & 1996; may be EGFR) 


7p11-12 


^ioma (Mulerls 1994b; Schrock). small cell lung (TOed 1994) 


7q21-32 


glioma (Kim; Muleris 1994b; Schrock) 


7q21^ 


head and neck (Spelcher), glioma (Schrock) 


7q33-temi 


head and neck (Speicher 1995) 


7 


colon (Schlegel 1995); glbma (Kam), head and neck (Speicher); 
prostate (VIsakorpi) 


8q 


small cell lung (Ried 1994) 


8q21 


bladder (Kallkmiemi 1995) 


oq24 


myeioiu leuKemia (Monameo) 




nWnmsn ^Kim* Mulfiris iflOdbV Hraast /Mulpri^ iQQ4a^ 




small cell fLevin 1894* Ried 1994^* breast rMuleris 1994a\ 




sarcoma fForus 1995a) melanoma fSoeicher) 






8q 


breast (Ried 1995; Isola; Muleris 1994a), small cell lung (Levin 1994 & 1995). B- 
cell leukemlas (Bentz 1994a). myeloid ieukemta (Bentz 1994b). glioma (Schteget). 
head and neck (Speicher 1995), prostate (Cher, Vtsakorpi) 


9 


head and neck (Speicher) 


9p 


head and neck (Speicher) 


9^ 


glioma (Muleris 1994b) 


9p13 


breast (Muleris 1994a) 


10p 


head and neck (Speicher 1995) 


1 Dpi 3-14 


bladder (Voorter) 


10q22 


breast (Muleris 1994a) 


11q13 


head and neck (Speicher 1995). breast (Muleris 1994a) 
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TABLE .12: Cancer cell lines sharing duplicated chromosomal regions 


' Chromosomal 
location 


Cancer lyDa & fafanftneaji ■ 


12 
12p 
12a 
12q12-15 
1 12q21.3-22 


B-celi leukemias (Bentz 1995a) 

head and neck (Speicher 1995), glioma (Schrock) 

glioma (Schlegel 1994) 

bladder (Voorter)» osteosarcoma (Tarkkanen), liposarcoma (Suijkertxiijk) 
liposarcoma (Suijkertuijk) 


1 
1 

13q21-34 
13q324enfn 


colon (Schtegel 1995) 

breast (Ried 1995), head and neck (Speicher 1995) 
bladder (KalKonlemll 995) 

head and neck (Speicher 1995), small cell lung (Ried 1994) 


14q 


head and neck (Speicher 1995) 


15c|26 


breast (Muterls 1994a) 


,e 

16p 
lopi 1 


head and neck (Speicher 1995) 
breast (Ried 1995) 

Kroaet Hiiliiforie iOO^\ 


17 

17p11-12 
17q 

17q21.1 
17q22-23 
17q22-24 


hAaH ami naHr /^nAirhor 1QQ<«\ 
QfitfiQsarcfuna fPcHUfi 1 ^dfia* TarlekanAn) 

W9i0W9ai 1 iQ 1 wwway I fli iVi^Cii 101 1/ 

breast (Ried 1995). small ceH lung (Ried 1994) 
breast (Muieris 1994a) 
bladder (Vooiter), breast (Muteris 1994a) 
breast (KalHbnIem1 1994) 


18p11 


bladder (Voorter) 


19q13.1 


small cell lung (Ried 1994) 


20p 
20q 
20q13.3 


head and neck (Speicher 1995) 

ovarian (Iwatnjchl), colon (Schlegel ig95), breast (Isola; Tanner) 
breast (Muteris 1994a), Kallionlemi (1994) 


22q 
22q11.13 


head and neck (Speicher 1995) 
bladder (Voorter), glioma (Schrodc) 


X 
Xq 
Xq24 
Xq11-13 


prostate (Visakorpi) 
small cell lung (Levin 1995) 

small cell (Levin 1994) I 
prostate (Visakorpi), osteosarcoma (Tarkkanen) | 



Control RNA is prepared from normal tissues to match that of the cancer cells In the 
experiment. Normal tissue is otitained from autopsy, bk)psy, or surgical resection. Absence of 
neoplastic cells in the control tissue is confinned, if necessary, by standard hrstok)gical techniques. 
5 cDNA corresponding to RNA that Is overabundant in cancer cells and duplicated in a proportion of 



-67- 



wo 97/38085 



PCT/US97/05930 



the same cells is characterized further, as in Examples 3-7. Additional cDNA comprising an entire 
protein-product encoding region is rescued or selected according to standanj molecular biology 
techniques as descrit>ed elsewhere in this dtsclosuie. 

5 
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Claims 



What is claimed as the invention is: 



5 1 . An isolated polynucleotide comprising a linear sequence of at least 1 0 nucleotides identical to 
a linear sequence contained in a polynucleotide selected from the group consisting of CH8- 
2a13-1. CH13-2a12-1, CH14-2a16-1. andCH1.9a11-2. 

2. An isolated polynucleotide comprising a linear sequence of at least 40 consecutive 
10 nucleotides at least 90% identical to a linear sequence contained in a sequence selected 

from the group consisting of SEQ. ID NO:15, SEQ. ID NO:18, SEQ. ID NO:21, SEQ. ID 
NO:23, SEQ. ID NO:26. SEQ. ID NO:29. SEQ. ID NO:31,. SEQ. ID NO:33, and SEQ. ID 
NO:35; tnit not In any of SEQ. ID NOS: 1. 3, 5. and 7. 

15 3. The isolated polynucleotide of daim 2, comprising a linear sequence of at least 100 
consecutive nucleotides at least 90% identical to a sequence contained in the selected 
sequence. 

4. The isolated polynucleotide of claim 2, comprising a linear sequence of at least 40 
20 consecutive nucleotides at least 95% identical to a sequence contained in the selected 

sequence. 

5. An Isolated polynucleotide comprising a linear sequence of at least 40 consecutive 
nucleotides that hybridizes with a DNA having a sequence selected from the group consisting 

25 of SEQ. ID NO:15. SEa ID NO:18, SEQ. ID NO:21, SEQ. ID NO:23. SEQ. ID NO:26, SEQ. 

ID NO:29. SEQ. ID NO:31,. SEQ. ID NO:33. and SEQ. ID NO:35; under conditions where it 
does not hybridize with SEQ. ID NOS: 1. 3. 5. 7, or any other DNA from a human cell. 

6. The isolated polynucleotide of daim 5, wherein the linear sequence is at least 100 
30 consecutive nucleotides 

7. An isolated polynucleotde comprising a sequence of at least 40 consecutive nucleotides that 
hybridizes with an RNA having a sequence selected from the group consisting of SEQ. ID 
NO:1S. SEQ. ID NO:18, SEQ. ID NO:21. SEQ. ID NO:23. SEQ. ID NO:26. SEQ. ID NO:29, 

35 SEQ. ID NO:31.. SEQ. ID NO:33, and SEQ. ID NO:3S; under conditions where it does not 

hybridize with SEQ. ID NOS: 1, 3. 5, 7, or any other RNA from a human cell. 
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8. The isolated polynucleotide of datm 7. wherein the linear sequence is at least 100 
consecutive nudeottdes 

9. The isolated polynucleotide of any of claims 2-8, wherein said linear sequence is contained in 
5 a duplicated gene or overabundant RNA in cancerous cells. 

10. The isolated polynudeotide of any of claims 2-8, which is a CH13-2a12-1 polynucleotide, and 
is contained in an encoding region for a protein or RNA molecule that controls cell 
proliferation. 

10 

11. The isolated polynucleotide of any of claims 2-8. which is a CH14-2a16-1 polynucleotide, and 
is contained in an encoding region for a protein with DNA or RNA binding activity. 

12. The Isolated polynucleotide of any of claims 2-8, present in a recombinant plasmid deposited 
1 5 under ATCC Accession No. 98074 

13. The isolated polynucleotide of any of daims 2-8, present In a recombinant phage deposited 
under ATCC Accession No. 97595. 

20 14. The isolated polynucleotide of any of claims 2-8, present in the XBCBT474 cDNA library 
deposited under ATCC Accession No. 97594. 

15. An isolated polynucleotide compHsing a Bnear sequence of polynucleotides essentially 
identical to a sequence selected from the group consisting of SEQ. ID NO:15, SEQ. ID NO: 

25 18, SEQ. ID NO:21, SEQ, ID NO:23, SEQ. ID Na26, SEQ. ID NO:29, SEQ. ID NO:31. SEQ. 

ID NO:33. and SEQ. ID NO:3S. 

16. An isolated polypeptide comprising a linear sequence of at least 5 amino add residues 
identical to a sequence encoded by a polynucleotide selected from the group consisting of 

30 CH1-9a11.2, CH8-2a13-1, CH13-2a12-1, and CH14-2a16-1. 

17. An isolated polypeptide comprising a linear sequence of at least 5 consecutive amino adds 
Identical to a linear sequence contained in a sequence selected from the group consisting of 
SEQ. ID NO:17. SEQ. ID NO20. SEQ. ID NO:22. SEQ. ID NO:24, SEQ. ID NO:28, SEQ. ID 

35 NO:30. SEQ. ID NO:32. SEQ. ID NO:34. and SEQ. ID NO:37; but not in any of SEQ. ID 

NOS: 2. 4. 6. and 8. 

18. The isolated polypeptide of claim 17, comprising a linear sequence of at least 15 consecutive 
amino adds at least 90% identical to a linear sequence contained in tt^ selected sequence. 
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19. The isolated polypeptide of daim 17 or 18. wherein said linear sequence Is encoded in a 
duplicated gene or overabundant RNA in cancerous cells. 

20. The isolated polypeptide of claim 17 or 18, which is overexpressed in cancerous cells. 

21. The isolated polypeptide of claim 17 or 18. wherein the polynucleotide selected from said 
group is a CH1.9a1 1-2 polynucleotide, and the polypeptide is a transmembrane polypeptide. 

22. An isolated polypeptide comprising a linear sequence of amino acids essentially identical to a 
sequence selected fifom the group consisting of SEQ. ID N0:17. SEQ. ID NO:20. SEQ. ID 
NO:22. SEQ. ID NO:24, SEQ. ID MO:28. SEQ. ID MO:30. SEQ. ID NO:32, SEQ. ID NO;34, 
and SEQ. ID NO:37; but not in any of SEQ. ID NOS: 2, 4, 6, and 8. 

23. An Isolated polynucleotide conaprising an encoding sequence for the polypeptide of any of 
claims 17 to 22. 

24. A monoclonal or isolated polyclonal antibody specific for the polypeptide of claim 22. 

25. A method of detecting gene duplication in cancerous cells, comprising the steps of: 

a) reacting DMA contained in a clinical sample witti a res^ent comprising ttie 
polynucleotide of claims 2-8. said clinical sample having been obtained from an 
individual suspected of having cancerous cells; and 

b) comparing the amount of any complexes formed between the reagent and the DNA in 
the clinical sample with the amount of any complexes formed between the reagent and 
DNA In a control sample. 

26. A method of detecting overabundance of RNA in cancerous cells, comprising the steps oft 

a) reacting RNA contained in a clinical sample witii a reagent comprising the 
polynucleotide of claim 2-8. said clinical sample having been obtained from an individual 
suspected of having cancerous cells; and 

b) comparing the anx>unt of any complexes formed between the reagent and the RNA in 
the dinical sample with the amount of any comptexes fomned between the reagent and 
RNA In a control sample. 



20 



25 
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27. A method of determining gene duplication or overabundance of RNA in cancerous cells, 
comprising the steps o^ 

a) amplifying DNA or RNA in a clinical sample with a prinner comprising the polynucleotide 
5 of claim 2^ to yield an amplified polynucleotide, said clinical sample having been 

obtained from an individual suspected of having cancerous cells: and 

b) comparing the amount of polynudeotkle amplified from the DNA or RNA with the 
amount of polynucleotide amplified from DNA or RNA from a control sample. 

10 28. A method of screening for cancer assodated with a gene duplication in an individual, 
comprising the steps ot 

a) detemnining gene duplication in cells from the individual according to the method of claim 
25; and 

b) correlating any gene duplication determined in step a) with an increased risk for the 
15 cancer. 

29. A method of screening for cancer associated with overexpresston of RNA in an individual, 
comprising the steps of: 

a) determining overexpression of RNA In cells from the individual according to the method 
20 of claim 26; and 

b) correlating any RNA overexpression determined In step a) with an increased risk for the 
cancer. 

30. A method of screening for cancer associated with a gene duplteatton or overexpression of 
25 RNA in an individual, comprising the steps of: 

a) determining gene dupllcatk>n or overexpresston of RNA in cells from the individual 
according to the method of daim 27; and 

b) correlating any gene dupik:ation or overexpresston of RNA determined in step a) with an 
increased risk for the cancer. 
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31 . The method of any of claims 28-30, which is a screerting method for breast cancer. 

32. A diagnostic kit for detecting gene duplication or RNA overabundance in cells contained in an 
5 individual as manifest in a dinical sample, comprising a reagent and a buffer in suitable 

packaging, wherein the reagent comprises the polynucleotkie of any of claims 2-^. 

33 A method for detecting altered protein expression in cancerous cells, comprising the steps of. 

a) reacting a polypeptkle contained in a clinical sample with a reagent comprising the 
10 antibody of claim 24, said clinteal sample having been obtained from an individual 

suspected of having cancerous cells; and 

b) comparing the aniount of any complexes fomned between the reagent and the 
polypeptkSe In the dinical sample with the amount of any complexes fonmed between the 
reagent and a polypeptkte in a control sample. 



15 



20 



34. A diagnostic kit for detecting a polypeptkle present in a dintoal sample, comprising a reagent 
and a buffer In suitable packaging, wherein the reagent comprises the antibody of daim 24. 

36. A host cell genetically altered by the polynucleotkie of any of daims 2 to 8 or daim 23. 



36. A method of screening a pharmaceutical candkJate, connprising the steps of: 

a) separating progeny of the cell of daim 35 into a first group and a second group; 

b) treating the first group of cells with the phamiaceutical candidate; 

c) not treating the second gnoup of cells with the pharmaceutteal candMate; and 
25 d) comparing the phenotype of the treated cefls with that of the untreated cells. 

37. A pharmaceutteal preparation for use in cancer therapy, comprising the polynudeotide of 
claim 2 to 8 or daim 23. saki preparatkm being capaUe of redudr^ the pathology of 
cancerous cells. 

30 

38. A method for treating an individual bearing cancerous cells, comprising administering the 
phamiaceutical preparatton of daim 37. 

39. A phamiaceutteal preparation for use in cancer therapy, comprising the antibody of daim 24. 
35 sakf preparatton being capable of redudng the pathotogy of cancerous cells. 

40. A method for treating an indivktual bearing cancerous ceDs. comprising administering the 
phamnaceutical preparatk>n of claun 39. 
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41. A pharmaceutical preparation comprising the polypeptide of daim 17 or 18 in an 
immunogenic fbrni, and a pharmaceuticaliy compatible exdplent 

5 42. A method for treatment of cancer, comprising administration of the pharmaceutical 
preparation of daim 41 . 



43. A method for obtaining cDNA corresponding to a gene that is duplicated or overexpressed 
in cancer, comprising the steps of: 

10 a) supplying an RNA preparation from control cells; 

b) supplying RNA preparations from at least two difterent cancer cells; 

c) displaying cDNA corresponding to the RNA preparations of step a) and step b) such that 
different cDNA corresponding to different f^lMA in each preparation are displayed 
separately; 

15 d) selecting cONA corresponding to RMA ttat Is present in greater abundance in the 

cancer cells of step b) relative to the control cells of step a); 

e) supplying a digested DfMA preparation from control cells; 

f) supplying digested DNA preparations from at least two different cancer cells; 

g) hybridizing the cDMA of step d) with the digested DNA preparations of step e) and step 
20 f); and 

h) further selecting cDNA from the cDNA of step d) corresponding to a gene that is 
dupBcated in the cancer cells of step f) relative to the control cells of step e). 

44. The method of daim 43, wherein the two diffierent cancer ceils used to supply RNA in step 
25 b) share a duplicated gene in the sanne region of a chronx^some. 

45. The method of daim 43, wherein RNA preparations from at least three diffierent cancer 
cells are suppHed in step b). 

30 46. The method of claim 43, wherein the three different cancer cells used to supply RNA in 
step b) share a duplicated gene in the same region of a chromosome. 



47. The method of claim 43, wherein the control cells of step a) are uncultured. 



35 48. The method of daim 43, further comprising supplying a digested mitochondrial DNA 
preparation; hybridizing the cDNA of step h) with the d'qested mitochondrial DNA 
preparation; and further seiecttng cDNA from the cDNA of step h) corresponding to genes 
that do not hybridize with the digested mitochondrial DNA preparation. 
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49. The method of claim 43. further comprising the steps o^ 
i) supplying ah RNA preparation from control cells; 

j) supplying RNA preparations from at least two different cancer cells; 
5 k) hybridizing the cDNA of sl^ h) with the RNA preparations of step i) and step j); and 

I) further selecting cDNA from the cDNA of step h) con^sponding to RNA that is present in 
greater al>undanoe in the cancer cells of step j) relative to the control cells of step i). 

50. The method of claim 49, wherein the gene to which the cDNA corresponds Is not 
10 duplicated in at least one of the cancer ceils used to supply the RNA in step j) relative to 

the control cells of step e). 

51. The method of claim 43, wherein the two different cancer cells used to supply the RNA 
preparations in step b) are breast cancer cells. 

15 

52. The method of claim 43. wherein the two different cancer cells used to supply the RNA 
preparations in step b) are from a comnDpn type of cancer, wherein the type of cancer is 
selected from the group consisting of lung cancer, glioblastoma, pancreatic cancer, colon 
cancer, prostate cancer, hepatoma, and myeloma. 

20 

53. The nriethod of claim 43, wherein the two different cancer cells used to supply the digested 
DNA preparations in step f) are breast cancer cells. 

54. The method of daim 43, wherein the two different cancer ceiis the digested DNA 
25 preparations in step f) are firom a common type of cancer, wherein the type of cancer is 

selected from the group consisting of lung cancer, glioblastoma, pancreatic cancer, colon 
cancer, prostate cancer, hepatoma, and myeloma. 

55. A method for obtaining cDNA corresponding to a gene that is deleted or underexpressed in 
30 cancer, comprising the steps of: 

a) supplying an RNA preparation from control ceils; 

b) supplying RNA preparations from at least two different cancer cells that share a deleted 
gene bi the same region of a chromosome; 

c) displaying cDNA conresponding to the RNA preparations of step a) and step b) such that 
35 diffierent cDNA corresponding to different RNA in each preparation are displayed 

separately; and 

d) selecting cDNA corresponding to RNA that is present in lower abundance in the cancer 
cells of step b) relative to the control cells of step a). 
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56. The method of claim 55, further comprising the steps of: 

e) suppling a digested DNA preparation from control cells; 

f) supplying digested DNA preparations from at least two different cancer cells; 

g) hybridizing the cDNA of step d) with the digested DNA preparations of step e) and step 
f); and 

h) further selecting cDNA from the cDNA of step d) corresponding to a gene that is deleted 
in the cancer cells of ^ep f) relative to the control cells of step e). 

57. A method for characterizing a gene that is duplicated or has altered expression in cancer, 
comprising obtaining cDNA corresponding to the gene according to the method of any of 
claims 43*56. and then sequencing the cDNA. 

58. A method of screening a candidate drug for cancer treatment, comprising obtaining cDNA 
comesponding to a gene that is duplicated or has altered expression in cancer according to 
ttie method of any of claims 43-56, and comparing the effect of the candidate drug on a 
cell genetically altered with the cDNA with the effect on a cell not genetically altered with 
thecDNA. 
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Figure 1 
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Figure 2 
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Figure S 
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Figure 6 
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Strand ( sense ) sequence ( 5 * - > 3 ■ ) 

1st base 



1, 


pchl-t7-lf 


1123 


CGG 


GAG 


GTT 


TCA 


GAT 


CGA 


C 


2. 


pchl-t7-2f 


1437 


GCG 


CTG 


CAA 


GTA 


CAA 


AAT 


TG 


3. 


pchl-t7-3f 


1729 


TCT 


AAA 


GTC 


CAA 


GAC 


CAA 


GG 


4. 


pchl-t7-4f 


1987 


CAG 


AAA 


TTA 


TGG 


TTT 


CTA 


CC 


5. 


pchl-t7-5f 


2266 


CaG 


GAA 


GAG 


GAG 


GGA 


TAA 


C 




















6. 


pchl-sp6-3fb 


2684 


AAA 


CAT 


ACA 


CAA 


TAA 


ACA 


C 


7. 


pchl-sp6-2rb 


2966 


TTG 


GCA 


GCG 


ACT 


GTA 


TTT 


G 


8. 


pchl-sp6-lrb 


3283 


CCT 


GAT 


TTT 


ATA 


GAA 


GCC 


CC 




sterand (antisense) 
















9. 


pchl-sp6-lf 


3302 


GGG 


GCT 


TCT 


ATA 


AAA 


TCA 


GG 


10. 


pchl-sp6-2f 


2987 


ATT 


CAA 


ATA 


CAG 


TTG 


CTG 


C 


11. 


pchl-sp6-3f 


2705 


TTA 


GTG 


TTT 


ATT 


GTG 


TAT 


G 


12, 


pchl-sp6-4f 


2458 


AGT 


GTT 


CAT 


TTC 


CAG 


TGA 


G 


13. 


pchl-sp6-5f 


2066 


CTT 


TGT 


TCT 


TGG 


ACT 


TTA 


G 


14. 


pchl-t7-3fb 


1748 


CCT 


TGG 


TCT 


TGG 


ACT 


TTA 


G 


15. 


pchl-t7-2rb 


1445 


AAT 


TTT 


GTA 


CTT 


GCA 


GCG 


C 


16. 


pchl-t7-lrb 


1141 


GTC 


GAT 


CTG 


AAA 


CCT 


CCC 


G 


17. 


CHla 


1063 


GTG 


CCT 


GTA 


GCA 


ACT 


GGA 


TGG C 


18. 


CHlb 


1079 


GTC 


ATG 


TTG 


GTC 


AGC 


TGT 


GCC 
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Figure S(A) 



1 GAATACATAT AXAAATCGTC TTCAGTTAGA OT i U-'lC m- ATCGC3CAGCG 

51 CAGCCGAACT GCTTTCflCTA AAGGAAAAGA TTATCTTCTC TTAGCTCAAC 

101 CACCCTTACT ACTTCCTOCG GAATCAGTAG AOCTrrCAGT ATTOCAACCT 

AATTCGAAAA TACGAATATA GAAAGGGAAG CTCAAACTCT 
201 TGTTCTGGGT GATTTAAGTA CHAGTATGCA CCAGGATCAC ITCGTCAATC 

251 ACACTCTAGA TGCAGTTQAA CITCAACCAA GCCATTCOCA AACTCTTTCT 

301 CAGTCTCnC TTTTAGATAT TACCCCAGAA ATCAATCCCT TCCCTAAAAT 

Ini f^^^^ GAGTCTCTIG AATATCAQC3C AGGACATATA CXATCACCAG 

JSJ ^gftgft^IT^T GTTOftGATCG ATftftat3AAAC AGAACAAAAG 

451 TCTGAGAGCT TrAGTTCTAT AQAGAAACCA TCiaaTRCCT ATCAAACAAA 

501 TAAAGTTAAT GAGTTAATG6 ATAATATTAT AAAAGftAGAT AiSaaCT^ 

551 TGCAAATOIT CACAAAGCTC TCTCAAACAA TAOTOcSc AflS^SS 

601 GCCftCTGTAC OOGACAATGA AGATCGGGAA GCCAAAATGA ATA^OOTGA 

651 CACftGCAAAG CAAACTTTGA TITCTCTTCr GGATTCT^ tSScSS 

701 AAGXRAAftGA AGAAGAACAG TCTCCAGAAG ATCCCTtS gSaSctS 

751 CAGAGGACAG CTACAGATTT TTATCCTCAA TIGCAAAACT CT^AtcJ 

I?? ^i^^ tS^"^ TTCTACA-TOG ATCAAACCAA S^SS 

851 TATTIATGAG ACTTAATAAT CGTATTAAAG CCTTAGAAGT TAftCATOTCT 

901 CTCAGTCGTC GCTATCTOGA GGAGCTIAGC CAAATOiaCC GA^A^S 

951 GGAAGAAATG CAAAAGGCTT TCAACAAAAC AATCGI^ 

1001 CTTCAAGAAT AGCAGAGGAG CAGGATCaGC GgSaCTGA SStC^ 

1051 TTGCTACAGG CACAGCTGAC CaACATCACA ^CtSt ^^S? 

1101 AGCAACAGTA GOiGAATPGA AACGGGAGGT TICAGATCGA CAAaStaTC 

^11} ^"^^"^^ TnGGTrCTT TCTCTIGTCT TCgStGAT GCTTtSaS 
JIS TCAAITTCAT GGAGaSS SJSS^S 

1251 TCCTAAAAGT AATCAGTATC CAAGCCCTAA AAGGTCTTIC TCITCCTATG 

«Sl ^^^^.^ TTTGAAAAGA AGAACTICAT TCCcJSSt 

1351 TCTCTACAGT TAACTCGCAA AGAAGTAGAC CCAAATCATT TOEACAITOT 
ii^} AAGlTPrCTC CAGAAAAGAA GAAGAAGCGC tSagSS 

1451 AAAlTGflAAA AATTGAGACC ATAAAGCCTC AAGAACCATT GOoSSS 

tlli S^I^ ACATAAAAGG AAGAAAGCCC TPiaCGAACC ^AgSSS 

1551 TTCTAATATG GGAGAAGTTT ATCACTCTTC TEAraAAGGT CCTO^TCTC 

1601 AAGGAAGCTC AGAAACTTCA TCACAGICAG AAGAGTCCTA TtStctS? 

llll SSS^ GTOCAATCGA HSSS SSSS 

1701 TGAGAAGAGG GCTTTAAAAC GAAGACGATC TAAAGTCCAA GACCAAGGAA 
nil ^^"^"^ CAGACTAAGT CGGgS^ 

1801 CATGACATAA TCAAAGGAfiA CAAAGAGATC ACCGTX3GGAA CATTTOGICT 
'T ACAGCftGTC TCGGGACAT A TCTAAAATEA ATIGAACTTT tSSSgS 
TTGTTGTrcT TTGAAGAACA GlSiSS 

1951 TTGGGGGAGG GAGAAAATAT TAATGGGAAA GGCATTCaGA AATTMGOIT 

2001 TCTACCTITT TAAAAAGTAG AaXSQGAlTOT GCTCaScTT GctSSS 

2051 CTACAGTITT ACAAAGCTGA TCACTOXTA TAAGGACaS ^ScS? 

2101 CTATAAAGAT GTlTnTCAC AAGATTAATT ACTCGGAC^ A^ATtS 
lll^ TCCTlftGOTG QGA-EAQGAAT GAAAGOCTAA AcScttcS 

nil SJSSS "^J^ TGCAccncc 

2251 CTATTTATA A TGCCACTCGA AGAGGAGGGA TAACTITnc TCTTAITIGA 

2301 TOTCrrrTAT AACTrrGTTA GOnTTTCAA GCTCCAAACA SSSJ 

2351 TOQAGGQQCT CTCTGCCTCA AGCTCAGGAG TCiSm^ SctSS 

2401 GATCCTAAAA AC7TCCCAAC TCGATCTITC TTTAGCAAAC TCActSSaJ 

^^o^ Ig^PTF^ ATGGAATTrr TAAOKnCTT CTCTtSg^ gSg^^ 

2501 CTCTTOITAT TTTCACTIAT TCAGGCTOGA TTaCTTC^ C^T^ 

2551 AACTCAATCA GGAAAAAATC CCXACAGGAT CTmTT^ AA^CTGA 
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Figure B(B) 



2601 


TATATGCAGA 


CAAATmrc 


2651 


CXSATTTGTGA 




2701 


ACTAA.TCCTC 


CAAACmCA 


2751 


AAOGTITGGC 


CAATTAGTAC 


2801 


CATAIOCACA 


GATCCAGTTA 


2851 


AAGTCTAAAG 


AGATTATTAT 


2901 


ATGTGCAGAG 


GTAATACATA 


2951 


GTCTTTAAAA 


AATAATTGGC 


3001 


AGTATGATTG 


TACAGTAATG 


3051 


GGGAGAGAAT 


TGACCATTTA 


3101 


AGCACTTTTA 


G1!AGTCATAA 


3151 


GGGTATTOTT 


TCTAATGTSA 


3201 


TTGCTC3CTTT 


AGGTTAACAG 


3251 


GTCTGCACAA 


TTAGCTATTC 


3301 


CCTTGAAAAG 


AGGTCCAGAT 


3351 


ATCTGTGTGT 


TGTGGGAAGA 


3401 


GTGCCATTAG 


AAACT6TGAA 


3451 


TT 





ACAAATTCAC CTTTTAAACA CXSAOSTTAAC 
TAGCTTACAT TTTAAACATA CACAATAAAC 
CTCTTTTTAT TAGTATQAAT ATAAAATTTG 
AAGT CTCATG ATATAATCAC AGOCTGCATA 
GTGAGTTTGT CAAGCTTAAT CTAATTGGTT 
TCCTTGATGT TTGCTTTGTA 1TGGCTACAA 
TCTGATGTCG ATCTCTCTGT ClTiTlTlTr 
AGCAACTGTA TTTGAATAAA ATGATTTCTT 
AATGAAAGTG GAACATGTTT CTTTTTGAAA 
TTGTTCTCAT GTTTAAGTTA TAACTTATTC 
CTGTTTTTAA ACTTGCCTAA TACCTTTCTT 
CTTATTTAAC GCCTTCTITG TTTGTTTAAG 
CGTGTTTTAG AAGATTTAAA TTTCTTTCCT 
AGAGCAAGAG GGCCTGATTT TATAGAAGCC 
GA QAGCAG AG ATACAGTCAG AAATEATGTG 
GAATTTTCAA TATGTAACTA CXX3AGCTGTA 
TTICCAAATA AATCTGAACA CTTGTCTTTA 
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Figure 9 



1 EYIYKWCSVR VALVRQRSRT ALSKGKDYLV LAQPPLLLPA ESVDVSVLOP 

51 LSGELEOTOT EREAETWLG nLSSSMHQOD LVMHTVEAVE LEPSHSoS!s 

101 QSUXDITPE INPUPKIEVS ESVEYEAGHl PSPVIPQESS VEIDNETEXDK 

151 SESFSSIEKP SITVETNKVN EUCNIIKED MNSMQIFTKL SEmvppSir 

201 ATVPENEDGE AKMNIADTAK QTLISWDSS SLPEVKEEEQ SPEDALLRGL 

251 QRTATDFYAE LQNSTDUSyA NGNLVHGSNQ KESVF12RU^ RIKALEVNMS 

301 LSC3RYLEELS QRYKKQMEEM QKAFNKTIVK LQNTSRIAEE ODOROIEAIO 

351 LLQAQLTOMT QLVSNLSATV AELKREVSDR QSYLVISLVL CWlisS^ 

«i ?f?y!°^ GOnSKLPKS NQypSPKRCF SSVOTMNTJCR RTSFPLMRSK 
f^^T^^ ^YIVEPL KFSPEKKKKR CKYKIEKICT IKPEEPLHPI 

«i FTOQRDFSNM GEVYHSSVKG PPSBGSSETS SQSEESYTCG 

^n^ ^^S^ Q St»™™ AUCRRRSKVQ DQGKLiraXI QTKSGSLPSL 

«i TTO3TOVTAV SGHI-N.IUF SYRRUXTCS UCNSL-^G 

^^nJ GIQKWflFLPF •KVDGIVUaL G-ATVLQS* SLPIRTMVDI 

701 L^RCFFTRLI TGTK71WKPS SIjGGIGMKA* TSSFSFVPIS CTFPVLCAFC 

751 LFIMPLEEEG -IJIXFDFFY NFVRFLKLQT LQCFEGVCA. SSGVWIROSK 

801 DPKNLPTGSL FSKLTGNEHL MEFLSLFCV DGDALVinY SGWlTSmVT 

851 NSMRKKSLQD LKWyiTDICR QIFDKFTF-T RR*PICBGFL •LTF-TTTIN 

o2n 3^E5?!"^ -VEYKI.RFG QLVQVS.YNH SUnYAQIQL VSLSSLI-LV 

951 KSKEIIIP-C LLCIGYKCAE VIHM*CRCLC LFFCL^IG SNCI-IK-PI^ 

1001 SMIVQ-.MKV EHVSF.KGEN •PFIWMFKL .LIEHP.*.* LFmLmiTl! 

1051 GYCL^CDLFN AFFVCLSCCF RLTACFRRFK FLSCLHN.LF RARGPDFIEA 

1101 P*KEV»«RAE IQ-EIM.SVC CQKRIPWM.L RSCSAIRNCE FPNKSEHLSL 



1 EYIYKWCSVR VALYRQRSRT 

51 LSGELENTOI EREAETWLG 

101 QSLLLDITPE INPLPKIEVS 

151 SESFSSIEKP SnYETOKVH 

201 ATVPEMEDGE AKMNXAOTAK 

251 QRTATaF!CAE L(9ISTDEL6yA 

301 LSGRVLEELS QRYRKOMEEH 

351 LLQAQLIMMT QLVSNLSATV 

401 QRCRNTSQFD GDYISKLPKS 

451 SLQLTCSKEVD PNDLYIVEPL 

501 ANGDIKGRKP FTTIQRDFSNM 

551 ISACTS LCMG QSQKTKTEKR 

601 HDIIKGNKEI TVGTFGVTAV 



ALSKGKDYLV LAQPPLLLPA ESVDVSVLQP 
DLSSSMHQDD LVNHTVDAVE LEPSHSQTLS 
ESVEYEAGHI PSPVIPQESS VEIENETEQK 
EIWENIIKED MNSQIPIKL SETTVPFINT 
QTLISWDSS SLE^EVKEEEQ SPEDALLRGL 
MGML VHGSN Q KESVFMRI23N RIKALEVNMS 
QKAFTaKTIVK LQNTSRIAEE QDQRQTEAIQ 
MSLKBJErJSVR Q SYLVISLVL CWLGT^TT/t^ 
NQYPSPKRCF SSYTOMNLKR RTSFPIllRSk 
KFSPEKKKKR CKYKIEKIET IKPEEPLHPI 
GEVYHSSYKG PPSEGSSBTS SQSEESYFCG 
ALKRRRSKVQ DQCTLIKTLl QTKSGSLPSL 
SGHI 
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Figure 10 



8t:rand (sense) 


sequence 


(5'- 


— >3 


• ) 










1 ncrhft — sd6 — If 


369 


GCT 


AAG 


CCA 


GAG 


CTA 




w 


2 • pch8-sp6-2f 


677 


tCT 


GAT 


CTT 


CTG 


CTG 


ATT 


C 




















3 - pch8-lf a 


1238 


TCT 


GAA 


CTG 


CCT 


GAG 


AGA 


C 


4 Dch8-2f 


1462 


CCA 








AX X 


AL.A 


A\j 




1745 


TCA 








V-Alj 


AAL. 


C 


6. pch8-4f 


1995 


ATT 


CTG 


GAG 


AGT 


TGG 


TAT 


CC 




2277 


GGA 






AAA 

AAA 


\9AlJ 


CTT 


G 


8. pch8-6f 


2559 


TCC 


ACT 


CAT 


ATT 


CCA 


ATA 


CC 




2849 


CCT 






CAU 


AAC 


TGT 


TC 


10.pch8-4rb 


3090 


GGA 


CCC 


TTC 


ACT 


TCC 


TTA 


C 




3370 


GGC 


par* 




X X\9 


XiwC 






12 .pch8-2rb 


3517 


CAG 


AAC 


ACT 


GCT 


CTA 


ACT 


G 


13 DC'hR-lr'Vi 


3970 


GTA 






Ux\* 


TTA 


7^ urn 
AAT 


Ca 


* strand (antisense) 


sequence 


(5-- 


->3 • 


) 










3617 


CAG 


X LJ\. 








xUT 




15.pch8-3r 


3360 


CCC 


AGG 


ACA 


AGT 


GGT 


GGC 


c 


16 . pch8-4r 


3140 


GTA 


AGG 












17 .pch8-5r 


3849 


GAA 


CAG 


TTC 


TGT 


CTC 


TCA 


GO 


18.pch8-6r 


3563 


CTT 


GGG 


TAT 


TGG 


AAT 


ATG 


AG 


19.pch8-5fb 


2277 


CAA 


GCT 


CTT 


TCC 


TTA 


TTC 


C 


20.pch8-4fb 


1999 


ATA 


GGA 


TAC 


CAA 


CTC 


TCC 


AG 


21.pch8-3fb 


1746 


TGG 


TTC 


TGA 


TCA 


TTT 


GAT 


G 


22 .pch8-2£b 


1462 


CTT 


GTA 


ATG 


CTC 


CCA 


TTT 


GG 


23.pch8-lfb 


1238 


GTC 


TCT 


CAG 


GCA 


GTT 


CAG 


A 


24.pch8-fb-lf 


941 


GTA 


GAG 


AAT 


CAC 


GTA 


CAG 


C 


25.pch8-fb-2f 


612 


CAA 


TGA 


CCA 


GTA 


GCA 


TAA 


C 


26.CH8-3670 


3891 


CAG 


CAT 


TTA 


AGA 


GAG 


GCA 


G 


27.CH8a 


387 


CCT 


GTA 


GCT 


CTG 


GCT 


TAG 


CAT CC 


28.CH8b 


510 


CCC 


CTT 


CAT 


TGA 


GAT 


CAT 


CTA G 
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Figure 11fA^ 



S^SSSS?^ gc«cggcxx:g gctcacaggt tctttaatog aggagccaat 

51 CTCTCTGCAC ACCTGGTnC ATCTAATAAT ATACAGRCAC ^QCTCTCAG 

101 GCCAGTTAAT CATCCCCAGT GTCCAGQCAC AGAgSg GtcSS^ 

Im 9^ ^"^^^. CTTTCTAGOC GAGAACAAO: TCTOKSgSa SSSSS 
111 CCTGTOGIAA TCCCATXSlTr GCrraASr tcSaSS? 

251 TGAGmarr cctqctgtgt tcaqgttaaa agacagagct gatcaacSa 

301 AMATOGAGA TATCATATTT GATTTCAGCT ATItSagS tSSS^ 

40i Ji^If -n^AGCCAGAG CTACAGgS? 

J?T AACAACATAG AAATTOTOAC CAGATTTIAT TTAGCATTTC 

Jni TAAATATATT GTAGACTTAA ACAGATATCT AGATOATOTP 

III TTTATATTCA GCAAACCTTA QAAAC^ J^SSS 

III CAACTTCTAT GTGAAGCaCT GEACTEATAT GGmSatoc 

IS? tgaccaaaag attoaaggag aagtcagaga gaSSS?S 

651 GTTTCTTACT ACCGATACM TGCTQCTOGA TCTTCTCCIG ATTC^SJ? 

75? SS^^^r TGliAAGCTGC TICGAAGTAC AGGTOAtS? JSSSg 

am Sl^fS^^ ACCATCCAAC TATCCXGAGA GCTATTTCCA ^StSJ? 

Ill ^^JSSS^^ CCTTCATCAG TATGGICATT GGTCGACTCA GATC^SI 

III SS^^ CAGGTCTCAG CGTATCCTIT GCCGGAGCa? 

III ^grggS^y^ CCAAGCTGCC ATGCTGTACG TCATTCTCTA ctSg^CT 

innJ ^S^f^ ACACCCATCA AGCAAAAATC AGAGAGATAG TCGA^S 

S^^^*^ AATTGGGXAA TTAGTATITA CAOtSGOGOTC AC^^TC 
I?m I^SSS^"^ TTGGGAACCT TACAAAGCTC CAAAAOTGC tSS^ 

^S?^^'^^ TTTCAAATGT CAQAGAACAG GCAAGCA^ ATOCTActS 
CAGTGAAAGA GTGCATGCTC AAGW3CAGCA MTTCTA^ ^SSSS 
TAAGGGAGGA GATQGTTCTG GACAATATCC CAAAGCTTCT GAACTgS?!^ 

SSSS^ f^*^^ 
AGOTCTQAC CCAAACAACA AACGCCTTCG TCAAATCAAG GACCAGATIC 

taacagactc tcgotacaat cccaggatcc TcrrccAGCT StctSSS 

ACTOCAOAT TTCAGTriAT ACTCAAAGAG aSSSSJ SSSJ 
AGAAAAGCAA ACCAAATQGG AGCATTACAA GAAA^GGCT TC^gSS 
TQACTQAGCT TGCTGATGTC TmCAGGAG TCAM^CT S^rSS? 

AAAACCnCA ISJSS 
^I^SS^"" AAITATGATG ATTCTACTCC tSS^S AaJ^^SJJ? 
AACTCATACA AGCTTTQGAA GAGGTTCAAG AAlT^flS CTT^TC? 
t^^^^^^ rajpCAGTT TCTTCCCX3AT ACTCGAaSt TTcSSJS 

TOAAAGAGGA GGTICTCATC aSS^ 
TOGTTQQQQA CCTTTCTTTC GCITCGCAGT TGATTCACAG TTlCACMrnr 
ATCMXSCAAG AAAGCATAAG GGTAAATCCA TCCaSctS cSSS^ 
AGCTACCTOC CTAAAGCTTG CCTCTCCCCT CGATC^S OTCtS^ 
g^?^ AAATCGCCrC GACCTCCTCA GCgStcS SSSS 

?SSS ^5?^ '^^^ 

«^^f ii?r2 TCTCTTCTAA AGASCATAAA GCTTCAGACC CACGACA-rra 
^^AAGTGCC TAC3CCGCCPG GACAAAGACA AQCTCAG^ ctSgCtSS 
CTAGGCOCAC GA-EAaSAQGT TOCCAAGCTT ACI^PMi t^tt^T^^ 
TACTCAAGGC A3CITAA3X» ^SS^ 

2301 gSggg? ?^SS? ^^^P^ SSS 

^^qT S2Sr™*^ TTGOCCTQCA TAGGGGACTG ATAITCAAOC CTCGAG«^a 

2351 GCXau«5TGAA TTGATGCCCA AGCTCAAAGA GTTOGGA^ SStSS^ 

SS^^ISS ■I^^'^SAA TACATACAGG SSSS^ SSS? 
CTGAAGATrr GGCAGGAAGA AGTATCTCGT ATCATAAATT ACASnrT 
GOA^GTOT AATAACTOIC XAAGAACGAA gSSaSJ JSSSS^ 
TGTACCAGIC CACTCATATT CCaATACTCA AGITEAOCCC TOtSqJJS 
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Figure 11 fB^ 



2601 TCTCTAACCT TEATTQGTCG ACTCTCCAGA GAAATCCTGC GGATCACAGA 

2651 CCXAAAAATG ACATGnCACA TAGACCAGCT GAACACTIGG TATGATATCA 

2701 A AACTCA TCA G GAAGT GACC AGCAGCOGCC TCTTCTCAGA AATCCAGACC 

2751 ACCTTOGGAA CCTTTOGTCT AAATGGCTTA GACAGGCTTC TGrXXTTTTAT 

2801 GATTGTAAAA GAGTTACAGA ATTTCCTCAG TATCnTCAG AAAATIATCC 

2851 TCAGAGACAG AACTCTTCAG GACACTTTAA AAACCCTCAT GAATCCTCTC 

2901 AGTCCCCTAA AAACTATTGT CGCAAATTCA AATAAAATTT ATTTTICCXSC 

2951 CATTGCCAAA ACACAGAAGA TITOGACroC GTATCTCGAG GCTATAATX3A 

3001 AG^TTOGCA G ATGCA GATT CaXSAGGCAAC AGATTGCCAA TGAATTAAAT 

3051 TATTCTTGTC GGTTTCATTC TAAACATCTG GCAGCTGCTC TCGAGAATCT 

3101 CAATAAGGCT CTCCTAGCAG ACATTGAAGC CCACTATCAG GACCCTTCAC 

3151 TTCCTTACCC CAAAGAAGAT AACACACTTT TATATGAAAT CACAGCCTAT 

3201 CTGGAGGCAG CTQGC ATTCA CAACXXACTG AATAAGATAT ACATAACAAC 

3251 AAAGCGCTTA CXXTEATTTTC CAATTGTAAA CTTTCTATTT ITGATCGCTC 

3301 AGTTGCCAAA ACTTCAATAC AACAAAAATC TGGGAATCGT CTCCCX3AAAA 

3351 CCGACCGACC CXSGTTGATrG GCXACCACTT GTCCTQQGAC TCCTCACTCT 

3401 GCTGAAGCAG TTCCATTCCC GGTACACOGA GCAGCTCCTG GCGCTGATTC 

3451 GCCAGTTTAT CTGCTCCAOG GTQGAGCAGT GTACAAGCXrA GAAGATACCT 

3501 GAAATTCCTG CAGATGTIGT GGGTGCXXnT CTGTTCCTGG AGGAITATCT 

3551 TCG GTACAC A AAGCTACCCA GGASGGTTGC TGAAGCACAT GTX5CCTAATT 

3601 TCATTTTTGA TCAGTTCAGA ACAGTGCTGT AACTGTriTT CCTACTTCTT 

3651 CAATGGAAGG ATTGTCCTTA GATCTTCCXA CCATCACAAA TGAATTTCAA 

3701 GAT6AAAAGA AACTCAGTTG CTCATACAAC TGCATTTTTT CTCTCTATTA 

3751 TGGGAAACAT CAG ACGTT AT GAGTAAGATA TATCTCATGG CATTAGTTAA 

3801 TATAACTGAT ATTCTT^AA TCATGGTATT ACATGCAATT TATAICAGAT 

3851 AAAAGCAGAA CACATTTTTG TACTGCCTCT CTTAAATGCT GAATGTAACT 

3901 GrTATGrATA AATCGATTTA GTTTEATCTT CTAAAGAACT ATTTOTGCAA 

3951 CTCCAGATIT TCAGTAAAAT AGTATTACTA GT 
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Figure 12(A) 



AFWRSPADRF FMSGANLSAH LVSSNmQfTP ALRFVNHPQC PGTE*SVRLT 
MLDFLAEI^ GOQAIUIIVS CX^N^IIIAELL RLSEFIPAVF RUOmDQQK 
YGDIIFDFSy FKGPELNESK LDAKPELQDI^ DCEFREI^NIE IVTRFYIAFQ 
SVHKYIVDLN RYLDDIOTCV YIQQTLETVL L3HEDGKQLLC EALYLYGVML 
LVZDQKIEGE VRERMLVSYY^ BYSAARSSAD SNUDDICKLL RSTGYSSQPG 
AiCRPSMYPES YFQRVPZNES FISMVIGRLR SDDZYMQVSA YPLFEHRSTA 
LMIQAAKLYV ILYFEPSIW T(^2AKMR£IV DKYFPD^MVI SIYHISITVNL 
VDAHEFVKAA KTALMNTLDL SMVREQASRY ATVSERVHAQ VQQFIiCBGYL 
KEEMVLEtnP KLUICLJUXSf VAXPWUflJTr A£>SACDPKNK RLHQIKDOIL 
TDSRVNPRIL FQUIOTAQF EFZLKEMFKQ IILSEKQTKWE HYKKECSSERM 
TELAD7FSGV KPLTRVEKNE NLQAHFREIS KQILSLNYDD STAAGRKTVQ 
LIQALEEVQE FHQI^SNLQV CQFLADTRKF UigKIRTINI KEEVLTIMQI 
V(a)LSFAMQL IDSFTSIMQE SIRVmPSHVT KLRATFLKLA SALDLPLUa 
NCSAMHPDULiS VSQYYSGELV SYVRRVLQII PESMFTSUJC HKLCTHDII 
EVPTRLDKDK LREYAQLGPR YEVAKLTHAI SIFTBGILMM KTTLVGIIKV 
DPKQLLEDGI PKELVKRVAF ALHRGLIFKP RAKPSEI11PK LKELGATODG 
FHRSFEYIQD YVNIYGLKIW QEEVSRIINY NVBOECNNFL RTKIQDWBSM 
YQSOHIPXPK FTPVmESVTF IGlOiCBEILR ITDPKmCHI DQIIHWCUK 
THQEVrrSSRL FSEIQITLGT K3I1ISU3RLL CFMIVKELQN FLSMFQKIIL 

RramTQDriiK timiavspix sivamsnkiy fsaiaktqki wtaylcaimk 

VGC94QIIiRQQ lANEIll^SCR FDSKHLAAAL ENLNKALLAD lEAKYQDPSL 
PYPKEDNTLL YEITAYliEAA GIHNPUJKIY ITTKRLPYFP IVNFLFLIAQ 
LFKLQYKiaG. GMVCRKPTDP VDHPPLVLQL LTLLKQFHSR YlSQUiALIG 
QFICSTVEQC TSQEaPEIPA DVA/GALLFIiE CYVRVTKLPR RVAEAHVPNF 
IFDEFRTVL* UFFLLLQWKD CP* IFPPSQM NUKMKRNSVA HTTAFFLSIM 
GNIRRY£«DI SHGIS*YK*Y CXNHGITCNL YQIKAEHXFV LPLLKAECNC 
YV*IKLVLCS iCELFVQLQIF SKX\/VL 
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Figure 12(6^ 



MLDFLAmOL CX3QAILRIVS CGNAIIAELL RLSEFTPAVF RLKDRADQQK 
YGDIIFDFSY FKGPELWESK LDAKPELQOL DESraNNIE rVTRFYLAFQ 
SVHKYTVDLN RYLDimNBGV YIQQTLETVL UJEDGKQLLC EALYLYGVML 
LVIDQKIBGE VRERMLVSYY RYSAARSSAD SNMDDICKLL ECTXJYSSQPG 
AKRPSNYPES YFQKVPINES FISMVIGRLR SDDIYMQfVSA YPLPEHRSTA 
LANQAAMLYV ILYFEPSIlil TOQAKMREIV DIOTKWWVI SIYMSUVNL 
VDAWEPVKAA KTAUWrLDL SNVRBQASRY AlVSERVHAQ VQQFUCBGYL 
REEMVLENIP KLLNCUICCN VAIRWUlUrr ADSACDPNNK RLRQIKDQIL 
TDSRYNPRIL PQUXOTAQF EFILKEMFKO HLSEKQTKWE HYKKBGSERM 
TELADVFSGV KPLTKVEKNE NLQAWFREIS KQILSI2JYI3D STAAGRKTVO 
LIQALEEVQE EHQI£SNLCfV CQFIAETRKF IHgOIRTINI KEEVLITOQI 
VGDLSFAWQL IDSFTSIMQE SIRVNPSMVT KLRATFLKLA SALDLPLLRI 
NQANRPDLLS VSQYY9GELV SYVRKVLQII PEaiFTSLLK UKLfflHDII 
EVPTRLEKDK LRDYAQIX3PR YEVAKLOHAI SXPTBSIUM RTTLVGIIKV 
DPKQLLEDGI RKELVKKVAF ALHRGLIFNP RAKPSELMPK LKELGAIMDG 
FHRSFEYIQD YVNIYGLKIW QEEVSRIINY NVEQECNNFL RTKIQDWQSM 
YQSTHIPIPK FTPVDESVTF IGRLCREILR ITDPKMrCHI DQIOTWEMK 
ragEV TSSRL FSEIC3TTLGT PGUJGLDRLL CFMIVKELQN FLSMF^JKIIL 
RERTVOOTLK TIWNAVSPLK SIVMJSNKIY FSAIMCTOCI WTAYLEAIMK 
VQC»©ILRQQ lANEI^SCR FDSKHLAAAL ENLNKAUAD lEAHYQDPSL 
PYKCEDOTLL YEITAYLEAA GIHNPUJKIY ITTKRLPYFP IVNFLELIAQ 
LPKLQYNKNL CMVC31KPTDP VTWPPLVUCSL LTIXXQFHSR YIEQLLALIG 
QFICSTVEQC TSQKIPEIPA DWGALLFLE DYVRYTKLPR KVAEAHVPNF 
IFDEFRTVL 



wo 97/38085 



PCT/US97/05930 



Figure 13fA> 



AGG OGC GGA AQT CGG GOT CTG ACC CGC TCC AGG TCC GGG ACT GCG GAT 
AGA AGA GGA CCG CCG CCT TGA GGG AGG GGT GGA AAC TGG GTG CCG GOT 
CCG CGC GCG ACC TCC GGC CCT GCG OGT GCG CCG TGG CGC GGC CCG GOT 

GAC AGO TTC TTT AAT GGA GGA GCC AAT CTC TCT GCA CAC CTG GTT TCA 
TCT AAT AAT ATA CAG ACA CCA GCT ere AGG CCA GTT AAT CAT CCC S 



^ ^ ^ !^ ACA ATG TTG GAC TTT CTA 

TGT 
CCT 
GAT 
AGC 

^ t:t ^ ^^r^T^k:;^:^si^ 



2^ ^^^2 ^ !^ 1?? ^ ^ "'^ «A ^ ATT GTT TCC IXTT 

. _ _ •** ~»vj v»v»A UAA TTA TGG GAA anf 

AAA CTG GAT GOT AAG CCA GAG CTA CAG GAT TTA GAT GAA ^ TTT 



GGT AAT GCC ATC ATT GCT GAA CTT TTG AGA CTC TCT GAG TTT ATT CCT 
^^^^ '"'"^ ^ ^ "'^•^ ^ ^ ^ -^^r^Q^ 
ATC ATA TTT GAT TTC AGC TAT TTT AAQ GGT CCA GAA TTA TGG GAA 



GTA CAT AAA TAT ATT GTA GAC TTA AAC AGA TAT CTA GAT GAT CTC AAT 
GAA GGG GTT TAT ATT GAG CAA ACC TTA OAA ACT GTG CTT CTC 2^ 
^ <^ ^^l^ OAA GCA CTG TtiC TTA TAT GGA GTT ATG 

CTA CTG 6TC ATT GAC CAA AAG ATT GAA GGA GAA GTC AGA GAG AGG ATG 
CTG GTT TCT TAC TAC OGA TAC AGT GCT GCT CGA TCT TCT GCT GAT tS 
AAT ATG GAC GAT ATT TGT AAG CTG CTT CGA AGT ACA GGT TAT TCT AGC 
CAA CCA GGT GCC AAA AGA CCA TCC AAC TAT CCC GAG AGC TAT Sc C^G 

AGA TCT GAT OAT ATT TAC AAC CAG GTC TCA GCG TAT CCT TTG CCG GAG 
CAT CGC AGC ACA GCC CTO GCA AAC CAA GCT GCC ATt3 CTG TAC 6TO MT 
CTC TAC TTT GAG CCT TCC ATC CTT CAC ACC CAT CAA OCA AAA ATG aS 
GAG ATA GTG GAT AAA TAC TTT OCA GAT AAT TOO GTA aS A^ 
ATC GGG ATC ACA OTT AAT CTA GTA GAT GCT TOO GAA CCT TAC AAA GCT 
OCA AAA ACT GCT TTA AAT AAT ACC CTG GAC CTT TCA AAT GTC AGA GAA 
CAG GCA AGC AGA TAT GCT ACT GTC AGT GAA AGA GTG CAT GCT OTG 

fJS ffi^ IT <»0 ATG GTT CTG Sc 

AAT ATC CCA AAG CTT CTG AAC TOC CTG AGA GAC TOC AAT GTT GCC ATC 
CGA TGG CTG ATG CTT CAT ACA GCA GAC TCA GCC TGT GAC CCA AAC Sc 

AAA OGC CTT OGT CAA ATC AAG GAC CAG ATT CTA ACA GAC TCT CGG TA^ 
AAT CCC AGG ATC CTC TTC CAG CTG CTG TTA GAT ACT GCA CAA TTT oS 
TTT ATA CTC AAA GAG ATC TTC AAG CAA ATG CTT TCA GAA AAG CAA ACC 
AAA TGG GAG CAT TAC AAG AAA OAG GGT TOG GAG CGG ATC ACT GAG CTT 
GCT GAT GTC TTT TCA OOA GTC AAA CCC CTA ACC AGA GTG GAG AAA AM 

GAA AAC CTT CAA GCT TOO TTC AGA GAG ATC TCA AAA CAA ATA TTC 
TTA AAT TAT GAT GAT TCT ACT GCT GCG GGC AGA AAA St ^ SI 
ATACAAGCTTTOGAAGAGGTTCAAGAATTCCACCAOTTCGAATCCAAT 
CTG OA GTA TGT CAG TTT CTT GCC OAT ACT CGA AAG TTT CTT CA; CAA 
^^'^'^^ '^^ STT CTG ATC ACA ATG CAG 

^J^^^ ^ TTG ATT GAC J^ril^ 

TCC ATC ATC CAA GAA AGC ATA AGG GTA AAT CCA TCC ATC GTT ACT AAA 

CTC AGA GCT ACC TTC CTA AAG CTT GCC TCT GCC CTC GAT CTC cS JJJ 
CTT OCT ATT AAT CAG GCA AAT CGC CCC GAC CTC CTC AGC GTC tS CAG 
TAC TAT TCT OOA OAO TTC GTA TCC TAT GTC AGA AAA GTT TTC ^ S? 

SfJ^ *^ '''^ """^^^ ^ "A ATC ATA AAG CTT CAG 

ACC CAC GAC ATT ATT GAA GTC CCT ACC CGC CTC GAC AAA GAC AAG 

AGG GAC TAT GCT CAG CTA GGC CCA OGA TAC GAG GTT GCC AA§ ??? 
CAT GCT ATT TCC ATT TTT ACT GAA 06C ATC TTA ATC ATG A^ 
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Figure 13fB> 
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Figure 14rA> 



Arg Gly Gly Ser Arg Gly Leu Thr Arg Ser Arg Ser Gly Thr Ala Asp 
Arg Arg Gly Pro Pro Pro * Gly Arg Gly Qly Asn Trp Val Pro Ala 
Pro Arg Ala Thr Ser Gly Pro Ala Arg Ala Pro Trp Arg Gly Pro Ala 
Asp Arg Phe Phe Asn Gly Gly Ala Asn Leu Ser Ala His Leu Val Ser 
Ser Asn Asn He Gin Thr Pro Ala Leu Arg Pro Val Asn His Pro Gin 
Cys Pro Gly Thr Glu ♦ Ser Val Arg Leu Thr Met Leu Asp Phe Leu 
Ala Glu Asn Asn Leu Cys Gly Gin Ala lie Leu Arg He Val Ser Cys 
Gly Asn Ala He He Ala Glu Leu Leu Arg Leu Ser Glu Phe He Pro 
Ala Val Phe Arg Leu Lys Asp Arg Ala Asp Gin Gin Lys Tyr Gly Asp 
He He Phe Asp Phe Ser Tyr Phe Lys Gly Pro Glu Leu Trp Glu Ser 
Lys Leu Asp Ala Lys Pro Glu Leu Gin Asp Leu Asp Glu Glu Phe Arg 
Glu Asn Asn He Glu He Val Thr Arg Phe Tyr Leu Ala Phe Gin Ser 
Val His Lys Tyr He Val Asp Leu Asn Arg Tyr Leu Asp Asp Leu Asn 
Glu Gly Val Tyr He Gin Gin Thr Leu Glu Thr Val Leu Leu Asn Glu 
Asp Gly Lys Gin Leu Leu Cys Glu Ala Leu Tyr Leu Tyr Gly Val Met 
Leu Leu Val He Asp Gin Lys He Glu Gly Glu Val Arg Glu Arg Met 
Leu Val Ser Tyr Tyr Arg Tyr Ser Ala Ala Arg Ser Ser Ala Asp Ser 
Asn Met Asp Asp He Cys Lys Leu Leu Arg Ser Thr Gly Tyr Ser Ser 
Gin Pro Gly Ala Lys Arg Pro Ser Asn Tyr Pro Glu Ser Tyr Phe Gin 
Arg Val Pro He Asn Glu Ser Phe He Ser Met Val He Gly Arg Leu 
Arg Ser Asp Asp He Tyr Asn Gin Val Ser Ala Tyr Pro Leu Pro Glu 
His Arg Ser Thr Ala Leu Ala Asn Gin Ala Ala Met Leu Tyr Val He 
Leu Tyr Phe Glu Pro Ser He Leu His Thr His Gin Ala Lys Met Arg 
Glu He Val Asp Lys Tyr Phe Pro Asp Asn Trp Val He Ser He Tyr 
Met Gly He Thr Val Asn Leu Val Asp Ala Trp Glu Pro Tyr Lys Ala 
Ala Lys Thr Ala Leu Asn Asn Thr Leu Asp Leu Ser Asn Val Arg Glu 
Gin Ala Ser Arg Tyr Ala Thr Val Ser Glu Arg Val His Ala Gin Val 
Gin Gin Phe Leu Lys Glu Gly Tyr Leu Arg Glu Glu Met Val Leu Asp 
Asn He Pro Lys Leu Leu Asn Cys Leu Arg Asp Cys Asn Val Ala He 
Arg Trp Leu Met Leu His Thr Ala Asp Ser Ala Cys Asp Pro Asn Asn 
Lys Arg Leu Arg Gin He Lys Asp Gin He Leu Thr Asp Ser Arg Tyr 
Asn Pro Arg He Leu Phe Gin Leu Leu Leu Asp Thr Ala Gin Phe Glu 
Phe He Leu Lys Glu Met Phe Lys Gin Met Leu Ser Glu Lys Gin Thr 
Lys Trp Glu His Tyr Lys Lys Glu Gly Ser Glu Arg Met Thr Glu Leu 
Ala Asp Val Phe Ser Gly Val Lys Pro Leu Thr Arg Val Glu Lys Asn 
Glu Asn Leu Gin Ala Trp Phe Arg Glu He Ser Lys Gin He Leu Ser 
Leu Asn Tyr Asp Asp Ser Thr Ala Ala Gly Arg Lys Thr Val Gin Leu 
He Gin Ala Leu Glu Glu Val Gin Glu Phe His Gin Leu Glu Ser Asn 
Leu Gin Val Cys Gin Phe Leu Ala Asp Thr Arg Lys Phe Leu His Gin 
Met He Arg Thr He Asn He Lys Glu Glu Val Leu He Thr Met Gin 
He Val Gly Asp Leu Ser Phe Ala Trp Gin Leu He Asp Ser Phe Thr 
Ser He Met Gin Glu Ser He Arg Val Asn Pro Ser Met Val Thr Lys 
Leu Arg Ala Thr Phe Leu Lys Leu Ala Ser Ala Leu Asp Leu Pro Leu 
Leu Arg He Asn Oln Ala Asn Arg Pro Asp Leu Leu Ser Val Ser Gin 
Tyr Tyr Ser Gly Glu Leu Val Ser Tyr Val Arg Lys Val Leu Gin He 
He Pro Glu Ser Met Phe Thr Ser Leu Leu Lys He He Lys Leu Oln 
Thr His Asp He He Glu Val Pro Thr Arg Leu Asp Lys Asp Lys Leu 
Arg Asp Tyr Ala Gin Leu Gly Pro Arg Tyr Glu Val Ala Lys Leu Thr 
His Ala He Ser He Phe Thr Glu Gly He Leu Met Met Lys Thr Thr 
Leu Val Gly He He Lys Val Asp Pro Lys Gin Leu Leu Glu Asp Gly 
He Arg Lys Glu Leu Val Lys Arg Val Ala Phe Ala Leu His Arg Gly 
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Figure 14(B> 



Leu lie Phe Asn Pro Arg Ala Lys 
Lys Glu Leu Gly Ala Thr Met Asp 
lie Gin Asp Tyr Val Asn lie Tyr 
Val Ser Arg lie lie Asn Tyr Asn 
Leu Arg Thr Lys lie Gin Asp Trp 
lie Pro lie Pro Lys Phe Thr Pro 
Gly Arg Leu Cys Arg Glu lie Leu 
Cys His lie Asp Gin Leu Asn Thr 
Glu Val Thr Ser Ser Arg Leu Phe 
Thr Phe Gly Leu Asn Gly Leu Asp 
Lys Glu Leu Gin Asn Phe Leu Ser 
Asp Arg Thr Val Gin Asp Thr Leu 
Pro Leu Lys Ser lie Val Ala Asn 
He Ala Lys Thr Gin Lys He Trp 
Lys Val Gly Gin Met Gin He Leu 
Asn Tyr Ser Cys Arg Phe Asp Ser 
Asn Leu Asn Lys Ala Leu Leu Ala 
Pro Ser Leu Pro Tyr Pro Lys Glu 
Thr Ala Tyr Leu Glu Ala Ala Gly 
Tyr He Thr Thr Lys Arg Leu Pro 
Phe Leu He Ala Gin Leu Pro Lys 
Met Val Cys Arg Lys Pro Thr Asp 
Leu Gly Leu Leu Thr Leu Leu Lys 
Gin Leu Leu Ala Leu He Gly Gin 
Cys Thr Ser Gin Lys He Pro Glu 
Leu Leu Phe Leu Glu Asp Tyr Val 
Val Ala Glu Ala His Val Pro Asn 
Val Leu * Leu Phe Phe Leu Leu 
He Phe Pro Pro Ser Gin Met Asn 
Ala His Thr Thr Ala Phe Phe Leu 
Tyr Glu ♦ Asp He Ser His Gly 
Leu Asn His Gly He Thr Cys Asn 
He Phe Val Leu Pro Leu Leu Asn 
He His Leu Val Leu Cys Ser Lys 
Phe Ser Lys He Val Leu Leu 



Pro Ser Glu Leu Met Pro Lys Leu 
Gly Phe His Arg Ser Phe Glu Tyr 
Gly Leu Lys He Trp Gin Glu Glu 
Val Glu Gin Glu Cys Asn Asn Phe 
Gin Ser Met Tyr Gin Ser Thr His 
Val Asp Glu Ser Val Thr Phe He 
Arg He Thr Asp Pro Lys Met Thr 
Trp Tyr Aisp Met Lys Thr His Gin 
Ser Glu He Gin Thr Thr Leu Gly 
Arg Leu Leu Cys Phe Met He Val 
Met Phe Gin Lye He He Leu Arg 
Lys Thr Leu Met Asn Ala Val Ser 
Ser Asn Lys He Tyr Phe Ser Ala 
Thr Ala Tyr Leu Glu Ala He Met 
Arg Gin Gin He Ala Asn Glu Leu 
Lys His Leu Ala Ala Ala Leu Glu 
Asp He Glu Ala His Tyr Gin Asp 
Asp Asn Thr Leu Leu Tyr Glu He 
He His Asn Pro Leu Asn Lys He 
Tyr Phe Pro He Val Asn Phe Leu 
Leu Gin Tyr Asn Lys Asn Leu Gly 
Pro Val Asp Trp Pro Pro Leu Val 
Gin Phe His Ser Arg Tyr Thr Glu 
Phe He Cys Ser Thr Val Glu Gin 
He Pro Ala Asp Val Val Gly Ala 
Arg Tyr Thr Lys Leu Pro Arg Arg 
Phe He Phe Asp Glu Phe Arg Thr 
Leu Gin Trp Lys Aep Cys Pro ♦ 
Leu Lys Met Lys Arg Asn Ser Val 
Ser He Met Gly Asn He Arg Arg 
He Ser ♦ Tyr Asn ♦ Tyr Cys 
Leu Tyr Gin He Lys Ala Glu His 
Ala Glu Cys Asn Cys Tyr Val * 
Glu Leu Phe Val Gin Leu Gin He 
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•*■ strand (sense) 



2. pchl3-sp6-2f 



sequence (S* — >3 ' ) 



1st base 
1. pchl3-sp6-lf 370 



726 



TTT ACT TCT AAC GCT TAT TC 
TGA AGG AGT CCT TTG AGA CO 



3. T7.1 

4. T7.2 

5. T7.3 

6. T7.4 



1140 
1361 
1602 
2041 



7. chl3-2480 2486 
- strand (antisense) 

8. SP6.1 2746 

9. SP6.2 2490 

10. SP6.3 2213 

11. SP6.4 1812 

12. pchl3-t7-lf 1165 

13. pchl3-t7-lfa 712 

14. pchl3-t7-2fa 286 

15. CH13-AS-1 536 



TCA CAA TGG GCT ACT GG 
TTC AAC GAG GGA GAT G6 
TTA GCA CCA CTG AGA GA 
GTT CTT TTA GGC ATT TA 
GCT GCG TCT GTT CGT CAG C 

CCT CTG CTT CAC AAC AT 

GCA GtA GGG CGG ACA CC 
(C) 

AGG GTC TTC TTC ATT GT 
GGA TTC TCT TTC TCT CT 
AGT GCA CTT CCA TGG GCG TC 
CCT TCA TCA GGT TGA CGA AC 
GCG GCA ATC AGA AAC GGA AG 
TGA ACA CGT GGT ACA T 
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Figure 16rA^ 



1 CTTCCCTGAG CXXTITTCIGC CTGTGTAGGA AGCAGAAGGC GGAATGTCGG 

51 CrCTGCCCTT CTCCX3TAAGA TGCnGCATTA AAACGTTCCT TATAAACTGG 

101 AAMGAAGGC TIOGGAAGAT GGCTAAAATC AGCAATCXTTT GGAATAACX3C 

151 AGAAGCATCC CTGCTTCCXrr GGGCXOGCCX: GTGQGCCTGC TTGTGCTGTT 

201 CAGTAGGTGG TTTTTAGAAA GQGCTTOCTT C3\QCGTCA.TT AGCAACAGGA 

251 GTCGTCGTCC GTTTGCATGA GGAAATGTTC TTAACCTTCC GTTTCTGATT 

301 GCCTCTAGAC TOCATCTCTC ATAGACAAAT GCCCCCATCT TTTACAGAGA 

351 ACCAGTCTCT TCTTrAAACT TTACTTCTAA CXSCTEATrCT TTTTACCTTA 

401 TATAGGAAAC CACTGATTGC TTGTGTQGAG AAACAGCTAT TAGGAGAACA 

451 TTTAACAGCA ATTCTGCAGA AAC3GGCTCGA CCACTTACTG GATQAGAACA 

501 GAGTGCCGGA CCTCGCACAG ATGTACCAGC TGTOCAGCCG GGTGAGGGGC 

551 QGGCAGCAGG CGCTGCTQCA GCACTOGAGC GAGTACATCA AGACTnTQG 

601 AACAGCGATC GTAATCAATC CTGAGAAAGA CAAAGACATG GTCCAAGACC 

651 tOTTGGACTT CAAOGACftAG GTGGACCACG 7GAT06AGGT CTGCTTCCAG 

701 AAGAATGAQC GGTTOGTCAA CCTGATGAAG GAGTCCXTTO AGACGTTCAT 

751 CAACAAGAGA CCCAACAAGC CTGCA6AACT GAITCGCAAAG CATGTGGATT 

801 CAAAGTTAAG AGCAGGCAAC AAAGAAGCCA CAGACGAGGA GCTGGAGCX^ 

851 ACGTTCGACA AGATCATGAT CCTGTTCAGG TTTATCCACG GTAAAGATGT 

901 CTITCAAGCA TTTTATAAAA AAGATITGGC AAAAAGACTC CTTGTrGGGA 

951 AAAGTGCXrrC AGTCGATGCT GAAAAGTCTA TGTTGTCAAA GCTCAAGCAT 

1001 GAGTGCX3GTG CAGCCTTCAC CAGCAAGCTG GAAGGCATGT TCAAOGACAT 

1051 GGAGCTTICG AAGGACATCA TGGTTCATrr CAAGCAGCAT ATGCAGAATC 

1101 AGAGTGACTC AGGCXXTTATA GACCTCACAG TGAACATACT CACAATGGGC 

1151 TACTGGCCAA CAIACACXXX: CATG GAAGTG CACTT AACCC CAGAAATGAT 

1201 TAAACnCAG GAAGTATTTA AGGCATTTTA TCTZGGAAAG CACAGTGGTC 

1251 GAAAACTTCA GTGGCAAACT ACTTTGGGAC ATGCTCTTTr AAAAGCGGAG 

1301 TTTAAAGAAG GGAAGAAGGA ATTCCAGGTG TCCCTCTTCC AGACACTGGT 

1351 GCTCCTCATG TTCAACGAGG GAGATQGCTT O^GCTTTGAG GAGATAAAAA 

1401 TGGCCACGGG GATAGAGGAT AGTGAAnGC GCAGAACX3CT GCAGTCCCTG 

1451 GCCTGTGGCA AAGCACGTGT GCTGATTAAA AGTCXTCAAAG GAAAGGAAGT 

1501 GGAAGATGGA GACAAGTTCA TTTTTAATGG AGAGTTCAAG CACAAGTTGT 

1551 TTAGAATAAA GATCAATCAA ATTCAGATGA AGGAAACTC3T TGAGGAACAG 

1601 GTTAGCACCA CTGAGAGAGT GTTTCAGGAT AGACAATATC AGATTGATGC 

1651 TGerAax:x?rc agaataatga agatgagaaa gactcttggt cataatcttc 

1701 TAGITTCTGA ATTATATAAT CAGCTGAAAT TICCAGTAAA GCCTGGAGAT 

1751 TTGAAAAAGA GAATTGAATC TCTGATAGAC AGAGACTATA TGGAGAGAGA 

1801 CAAAGACAAT OCGAATCAGT ACCACTACXTT GGCCTGACX5C ATCTCCAGAC 

1851 GCTTCCCXriT CATGAAACAC TAGAATGTAC CCTCAGAGCA GGAAGCACAC 

1901 CTGTGCCATT TCTQGGACTC TGATTGATCC AGCTGTGGAC ATTGGAAGGC 

1951 GAAGGAAGQG AGGTGGCTCC TGGGTCATCT TTCACAAGGC TCAAGACTTC 

2001 AACXTTGCAGA TCTATCTTTT TCCCTCCAGT TTTTCCTCTA GTTCTTTTAG 

2051 GCATTTAAAT TOTTTCTGTT ACTCTGTGCA AAATAACTTT GAGATTOGAC 

2101 AAGAAGAOXST TACTAAAGAG AAGTrCCTTT AAAAGGICTT GriCril^ ' mT 

2151 CAAAAAGCT6 CAAGTITOGT TmTliri O rr GTGTGATCAT GAGTGCACAA 

2201 TQAAGAAGAC (XTAGATGCT GGATTTTTTA GCICTGAAGA TTCCTTAGGT 

2251 ATCCCTGAAG ACAGCTCOCT CAGATGATCA GCATTTAGAG TGAAAACAAG 

2301 GGCCXrrrCAT GGGTGAACAT TAGAAAGAGC CAGGGTTCAA AGCTGGOGAA 

2351 TGGATCACGC ACCTTAGCCA CTGGCCCCTC CXTTOrrrTCAT GTAI TICCAA 

2401 AAGTTGTTkAA CTnGGTQGC TGATTTTTCG TAAGTCAGGT 1TCTAAGTGA 

2451 GCTCCCTGAG GTGCCAAGGC CATCGTGTCC GCCCTGCTGC GTCTGTTCGT 

2501 CAGCTCAGTT CCTTGTGAAT CTCTOTTTTA GGGGTTGGGG CTAGTGTGTT 
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2551 
2601 



^GTTTCCA TTCIAAGATT GACSTCTOGCA GTCCrir-mn. 
GQCTAACTCC TCTITCATIT TTTITAAraG SSSI ^KSCATTG 
2651 TAATAAAGrr TCGTITCGTr SSrS SSS^F"^ GTQATTGCAA 

2701 TCTCTCCTCT AAACTCTiil I^^^^ TGCGCAGGGA CGATCCTICT 

2751 GTCaJSSS SSSJS JSS^ A^AAAG TctSS 

2801 ^QGTTITCro SS^S^ 

TCAGnVGTGA TOmfiAAtSe f^TM^r^SS? ™^CACTG AAAGGAAACT 
2901 CATTTAAAAG StSS? SS^S^ CAAAGATACT TTTGAGaS 

2951 AAAGCTACCA S^JJ ATTPIGAI^J 

3051 GacTCTcrar ttoaaacatc otscttS tl^^^^^ aaatoattaa 

3101 CCATA.TAAA A^^COCAcS ^^^^ "^^^P^ 

3151 TCATTTATCA GTTfY-^»i; iTOWmyiv ATTTTTATCT ITCAAAA'Trr 

3201 iSSS S^^S SSSS 

|251 ATTORATCTA GACTTACTTT GAaSSSS TAATATHSIA 

3301 TACATTAATA AAACTxS S^JSJ^^ ^SS?" 
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Figure 17 



1 FPEPFLPV^E ABGGMSALPF 

51 EASLLPWARP WACLCCSVGG 

101 PLDCICHRffll PPSFTENQSL 

151 LTAILQKGLD HLLDENRVPD 

201 TAIVTNPEKD KEMVQDLLDF 

251 NKRPIQCPAEL lAKHVDSKLR 

301 FEAFYKKDIA KRLLVGKSAS 

351 ELSKDZMVHF KQHMQNQSDS 

401 KLQEV/FKAPy LGKH SGRKLQ 

451 LLMFNEGDGF SFEE IKMftTG 

501 EDGDKFlt'NG EFKKKLFRIK 

551 AIVRIMraiRK TLGKNLLVSE 

601 KDNPNQYHW A^RICRRFPF 

651 KEGRWLL6HL SC2GSHLQPAD 

701 RRCy*REV/PL KGLVLVSKSC 

751 SLKTARSDDQ HLE-KQGPFM 

801 SCKUHWLXFR KSGF*VSSLR 

851 VFPF*D*VWQ SLFFCIGVTA 

901 LCCKL^KVyC DLECS*CCEAE 

951 Q«»C«RGMYD KDTFEITFKS 

1001 SGIYQWII* PCAKLLRV5F 

1051 HL»VP«YW» ERENRFLFFF 

1101 TLUCLCTIMOM TH 



SVKWCIKTFL INWK-RLGKM AKISNPWNNA 
F^KGLPSASL ATGWVRLHE EMFLTFRF^L 
L.lUjJTLIIi FTLYRKPLIA CVEKQLLGEH 
LAQMYQLFSR VRGGQQALLQ HWSEYIKTPG 
KDKVDHVIEV CFQKNERFVN LMECESFETFI 
AGtlKEATDEE LERTLDKIHI LFRFIHGKIV 
VDAEECSHLSK LKHECGAAFT SKLEGMFKDM 
GPIDLIVNIL TM3VWPTVTP MEVHLTPEMI 
WOTTLGHAVL KAEFKEGKKE FQVSLFQTLV 
lEDSELRRTL QSIACGKAKV LIKSPKGKEV 
INQIC»1KETV EBQVSTTERV FQDRQYQinA 
LYNQLKFPVK PGDUOCRXES LXDRDYMERD 
MKH*NVPSEQ EAHLCHFWDS D^SSCGHWKA 
VSFSLQFFL* FF^AFKLFLL LCAK^L^DWT 
KFGI*FSCVIM SAQ-RRP^ML HFIiAUaP»V 
GEH«KEPGFK AGEW^fTHPSK V7PLPVSCXSK 
CQGSXSmPMi SVRQLSSL^I SVLGVGASVF 
L*FFLIAVFV •LQ»»SLVWF LQSCAGTILV 
VIXMKD*KDP VGI\ftfFCWYl YHBWSEBKV 
TLYPr»»HVS F*LKATKGIL IMA^VFKAIF 
♦NMPV^NWTP CGFPY»NPHS LIVTFIFENF 
LISSLCLEIV 1SIIVI*CRLTL NKISLIGLKI 



201 TAIVZN»3CD KEMVQDLLDF 

251 MKRPNKPAEL IAKHVD6KLR 

301 FEAFVKKDLA KRLLVGKSAS 

351 ELSKDIMVHF KQHMQNQSDS 

401 KLQEVFKAFY LGKHSGRKLQ 

451 LIimJEGDGF SFEEIKMATG 

501 EDGDKFIFNS EFKHKLFRIR 

551 AIVRZMKMRK TL6HNLLVSE 

601 KENRSOYHYV A 



.M^QLFSR VRGGQQALLQ HWSE yiKTPG 
KDECVDHUTEVT CFQKNERFVN I21EQESFETFI 
AGNKEATDEE LERIUDK3HI LFRFIHGKDV 
VDAEKSMLSK I^HBCGAAFT SKLEGMFKEM 
GPIDLTVNUi TMGVWPTSTTP MEVHLTPEM 
WQTTLGHAVL KAEFKEGKKE FQVSLFQTLV 
lEDSELRRTL QSLACGKARV LIKSPKGKEV 
INQI»1KETV EBQVSTTERV FQDRQYQIDA 
LYNQLKFPVK PGDLKKRIES UDRDYMERD 
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*»• Strand ( sense ) 







1st base 


1. 


. pchl4-sp6-lf 


686 


2. 


pchl4-sp6-2f 


1005 


3. 


pchl4-SP6-3f 


1315 


4 


pchl4-sp6-4f 


1589 


5. 


pchl4-sp6-5f 


1808 




strand (euitisense) 


sequence 


6. 


pchl4-sp6-6fb 


2020 


7. 


pchl4-sp6-5fb 


1757 


8. 


pchl4-sp6-4fb 


1607 


9. 


pchl4-sp6-3fb 


1339 


10 


•pchl4-sp6-2rb 


1023 


11 


.pchl4-sp6-lrb 


704 


12 


. CH14a 


629 


13. 


. CH14b 


644 


14. 


■ CH14C 


109 


15. 




90 



mce (5* — >3') 

GGC TTA ACA CTC AAT GTA C 
CTA TGA AAA GAC AGC TTA AG 
ATT TAG TTT GAA AAG CAT G 
CA6 ACT TTA AAG TCA CAA G 
CAA AGA CTT GGT GTA TAG TG 
(5'— >3') 
GCA GTT TAA TTT GGT CCT G 
CTG TAA TTA TAG TTC TCT C 
CTT GTG ACT TTA AAG TCT G 
ATA ATC ATG CTT TTC AAA C 
TTA AGC TGT CTT TTC ATA G 
GTA CAT TGA GTG TTA AAC C 
CGG CAG AGC TGA CTA CTG GAA GG 
CAA GCA GG6 AAG TAA CGG CAG 

CTT GTT AGC TTG TTT AGA AGG 
TGG AAG AG 

GGT GQA AGA GAA GGT CTC CTT 
TCA GGC 
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Figure 19 



1 

A, 




AX XA^Vj\JSJX\* 




X V9 loWW XVJWi 


AAuwl- XvaAAA 


1 m 




XV XxiAX X X iXa 


1 ^1 




W XAAk* X AV» X\. 






X vAaAA^ X X 






X AOV,3nV.A-AA«V^ 




jv A IV ^ a/TT* A 


AX XV.X\9XAuiA 






IV jv A r^r* IV r*iv Ji A 
AAAUL.AUAAA 


4UX 


LTl'lVXAAAAA 


X\3(j(AxAi\iAVi 




AAAGCCTTCC 


CCAAX'XVvX'/iA 


DUi, 


AAATltjTAAA 


TATQaxUCAA 


CC1 


ATGTGAGTA6 


AAGAATTGCA 




GCACCACCTT 


CCAGTAGTCA 


0!>1 


GATGGAATGT 


CCCTTCTATC 


•7ni 
/UX 




\jGA.LrX\JCIA!uA 






TGAAATGGAT 


DUX 




AVjAAvjAXvJaX 




Aiotfil Au. X \» X A 


CAoAAV^i'X\^X' 




X \JA X AA X A X \9 


AAGTTTTATT 


7^X 


rvLsX X lv7XAAk.9 


X 1 XAX XAXoX 


1001 
^ wx 


X X XA^XAXVaA 


AAA\3AL.A\3Vm X 


1051 


TGOGGCATGT 


TTGTGCACTG 


1101 


ATCA.TGGTT2V 


GTCATOGTAC 


1151 


TGAGTOGAGA 


GATOCAGTGA 


1201 


ACTITCACTT 


TOXXrCAAAGA 


1251 


CAGCATTGGC 


CAAAGGTACT 


1301 


TTAGmTTA 


AGTGAATTIA 


1351 


GAGGCTGAGT 


GCTACTTTCG 


1401 


CAGGATGAAT 


GAGGTGGGTA 


1451 


GCAGAAAATA 


GGAACAGTTC 


1501 


ATGCCTTCTA 


AATAATTTrr 


1551 


AAArri'mr 


ACAAGTATTT 


1601 


TCACAAGATT 


ATAAATGTAC 


1651 


TTCTCAGAAT 


CXHACAGAAAA 


1701 


GAAATCTTAAA 


AATTAGATTT 


1751 


ATTACAGAGA 


TCAGATCAGA 


1801 


crmxsGccT 


ACTGTATIAC 


1851 


AACTGTTAAG 


GCAAGAAGTG 


1901 


CTGATTTCAA 


AGACTTGGTG 


1951 


OGTTAGAAAA 


GTGGATTAAT 


2001 


TCAGGACCAA 


ATIAAACTGC 



TCGAACAGGA AGCATCTCCA GCAGTGTGTC 
QGAGACCTTC TCITCCAOCT TCTAAACAAG 
AAGGCTATAT CTGAAGCTCA AGAATCCGTA 
TACAGTTOCA CAGAAACAGA CACTTCCAGT 
CTCAAGAAGA ATTGCTAGCA GAAGTGGTCX 
AGAATAAGTC CXTCCATTAA AGAAGAGGAA 
AAAAAATCAA GCTGAGATGA GTGAACTGAG 
AACTTTTGGA GCGCTGCAAG TAGTGGCCTG 
TGTGCCTACC ATCACCCCAT CTCACCCTGC 
ATTTGCTGAA AAATOTnGT TTGTTCACCC 
AGTGTACTAA ACCAGATTGT CCCTTCACTC 
GTACTGTCTC CAAAACCAGT TGCACCACCA' 
GCTCTGOOGT TACTTCCCTG CITGTAAGAA 
ATCCAAAACA TTGTAGGnT AACACTCAAT 
TTCTACCATC CCACXATTAA TGTCCXIACXIA 
TCG ACCT CAA ACCAGCGAAT AGCACCCAGT 
GCAGTTTroA AGTTTTCATG TACTGATGAA 
CAAATCTTTG AAACOTOGAA TATATTGCTT 
G COTAT CTAT CTGAAGTGTC TAATTTTICA 
GGTTTIAACA TTGGGTICTTT T TClTriXJ i T 
TAAGGAAGAG CTAAATTCTG TTAAAATATT 
CTGTT GTOA G GATCAGCATA TGAAATTGAC 
TGCAGCTTAG GGGGCTACAC G G 'rmcm^lt ; 
GGCAGTTGTC ATTATTCTAA AAATTGTACT 
TTATATAATG TTCATAATCC ACCATGAAAA 
GAGGC TGCTT AAAATATTCA ATTCTGCTTT 
GTITGAAAAG CATCATTATA GAGGCCTCTC 
GTAAAGTTCC AGTTTTCCAG CCTTCTGTGA 
TGGACAGTGG AGGCAGCTGG AATGGCAAGT 
TATACAGTGC TCTCATTTAC TAATAACATA 
TTGGGAAACT ACATTATCAC AAAATTATAC 
ACATACTGTA TCTGAAAACA GACTTTAAAG 
ATATGTATIC TCACATTCTG AAAAATAACA 
TATACTTAGT TACT ACTGAA GATAATTTTT 
AAATAGTATA TTTTAAATGA CAGAACTATA 
TAGCTAAACT GCAAG ATAGA TAGGATGAAA 
TTACAGAGTT TITTTGTGTG TGC'ITITIAA 
TCAAATGCTT TAGAGTTAAA TAACAGATCA 
TATAGTGTTA AAAATTAAAG CTTAAAAGGT 
GCAAAAGGGG TAATAAAGAC IGCAACATTC 
T 
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Figure 20 



1 EDEDDYGSRTG 

51 TKTroySTVP 

101 OKGDSVEKNQ 

151 KAFPNCKFAE 

201 APPSSSQLCR 

251 RHALKWIRPQ 

301 S^YEVLLPiy 

351 WGMFVHCX XE 

401 TFTFPKDYTM 

451 EAECYFR»SS 

501 MPSK*FFWET 

551 PSESTENILS 

601 UAYCITYRV 

651 G-KSGLWeKG 



SlSSS\rSVPh KPERRPSLPP 
QKQTLPVAPR TRTSQEELIA 
AEMSELSVAQ KPEKLLERCK 
KCLFVHPNCK YDftKCTKPDC 
YFPACKKMEC PFYHPKHCOT 
TSE-HPVLPG RRSCSLEVFM 
UCCLIFQVCK FIMWF^HWVF 
DQHMKLTSWL VMVLQLRGLH 
FIIHHEMSIG QRY-GCLKYS 
SFPAFCESIMN EVGMDSGGSW 
TLSQNYTNFF TSIYILYLKT 
VY«R*FLKCK N^I^IVYFK* 
^l^^^WFKTVK ARSVKCFRVK 
••RLQHSQDO HCL 



SKQANKNLIL KAISEAQESV 
EWQGQSRTP RISPPIKEEE 
VWPACKNGDE CAVHHPISPC 
PFTOVSRRIP VLSPKPVAPP 
NTQCTSPDCT FYHPTINVPP 
Y»*KILYRTC QIFETWNILL 
^LFCFVrVEKTA •GRAKFC*NI 
GCCVSGEMQ* GSCHYSKNCT 
ILLFSF^VNL V-KA^LYRPL 
NGKCRK*EQF YTVLSFINNI 
DFKVTRL^MY ICILTF^KIT 
QN ^VRD QIR •VNCKIDRMK 
•QITDFKDLV YSVKN»SLRG 



mODYGSRTG SISSSVSVPA KPERRPSLPP SKQANKNLIL KAISEAOesv 

EvSsS SiSS 

AEMSELSVAQ KPEKLLER CK YWPACKMGDR CAYHHPISPC 
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Figure 21 



1 AAAACTTTCG GAAGAGAAAG 

51 MAMATTCIi ATCATGATC6 

101 GTCTCGAACA GQAAGCATCT 

151 AAAGGAGACC TTCTCTTCCA 

201 TTGAAGGCTA TATCTGAAGC 

251 CTCTACAGTT CCACAGAAAC 

301 CTTCTCAAGA AGAATTGCIA 

351 cccx:agaata AGTCCCCCXZA 

401 TAGAAAAAAA TCAAGATTAC 

451 ACAAGATCAT TTATTCTCSAA 

501 OGCACCAAAC CAAGANTCGG 

551 TTTCAGGGAC CCTTATGCAG 

601 GCAAGTCCCA AG 



TTOOdGTOG TAAGTTCAGT TGTTAAAGTA 
AGAAGAGGAG GAAQAAGATG ATGATTACX3G 
CXAGCAGTGT GTCTGTGCCT GCAAAGCCTG 
CCrrCTAAAC AAGCTAACAA GAATCTGATT 
TCAAGAATCC GTAACAAAAA CAACTAACTA 
AGACACTTCC AGTTGCTCCC AGAACTCGAA 
GCAGAA3TCG TCCAGGGGAC AAAGTAGGAC 
TTAAAGAAGA GGAAACAAAA GGAGATTCTG 
TATGACATGG AATCCATGGT CCATGCAGAC 
GAAGCCAAAG CTGTCTGAGG AAGTANTAGT 
GGATGAAGAC TCCAGATTCC CTTCGGGTTC 
ACACNAQATC TTGTTCAACC AGATAAACCT 



1 KTFGRESaJW •VQLLK-KNS IMMEKPRKKM MITGLBQEAS PAVCLCLQSI# 

51 KGDLLFHLIl^ KLTRI«F*KL YLKLKNP^QK QLTTLQFHHN RHFQLLPELB 

101 LXiKKNC«QKW SRGQSRTFRI SPPIKEEEIK GDSVEKNQXDY YEMESMVHAD 

151 TRSFILKKPK LSEEVXVAFN QXSGMKXADS LRVLSGTLUEQ TXDLVQPCKP 

201 ASFK 



1 IW5CTGCTCT GACXXXSNAGN GGAATGNATG GrKXSCTTGTT CNGAAACNNG 

51 CCAGATGGCG NGASGGGGAC AAGTAGCGGC GTGATINAGA AGAGGGAGGT 

101 GAGGGTNCTC ACATCACCNC ATCTOACX::AT GNCGNGCOJT CCCCANTANT 

151 AANANTGATG ATAGMQGGAA GIGGGCXXAC CCAGAAGCNT GATTGAGCGG 

201 CXXXrCAGTAN GAAACMNSTT TGTCXAimA 6NCATACMNA mTTAGGGTT 

251 CMAGC3«30GT COCCX^GCAOC NGCANANNNM CNNCNGGGAC NACNGCXX3W 

301 MNNTEaNGTTA NNCNGNGNAG NNAAAAAATT CAATCATGAT GGAGAAGAGG 

351 AGGAA GAAGA TCATGATTAC GGGTCTCGAA CAGGAAGC31T CTCCAGCAGT 

401 G TGTCTCTGC CTGCAAA " 



Untitled translated in RF 2 

1 SCSDGXXNXW XLVXiQCARMX EGDK«RKDX£ EGGGGXHTCK SXKXXXSFXX 

51 XXHIKGS6PT (^CXD*AAASX KXVCPXXKXX XRVX3CASPAX AXXX3CGXXPX 

101 XXLXXXXKXF NHDGEEEffiD DDYGSRTGSI SS5V5VPA 
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CHf.9aff.2 

GA AAA CAA ATG GAA GAA ATG CAA AAG GCT TTC AAT AAA ACA ATC GTG 
AAA CTT CAG AAT ACT TCA AGA ATA GCA GAG GAG GAG GAT CAG CGG CAA 
ACT GAA GCC ATC CAG TTG CTA CAG GCA CAG CTG ACC AAC ATG ACA CAG 
CTT GTT CAA 

LyB Gin Met Glu Glu Met Gin Lys Ala Phe Asn Lys Thr He Val Lys 
Leu Gin Asn Thr Ser Arg He Ala Glu Glu Gin Asp Gin Arg Gin Thr 
Glu Ala He Gin Leu Leu Gin Ala Gin Leu Thr Asn Met Thr Gin Leu 
Val Gin 



CH8'2a13'1 

GAA CAG GCA AGC AGA TAT GCT ACT GTC AGT GAA AGA GTG CAT GCT CAA 
GTG CAG CAA TTT CTA AAA GAA GGT TAT TTA AGG GAG GAG ATG GTT CTG 
GAC AAT ATC CCA AAG CTT CTG AAC TGC CTG AGA GAC TGC AAT GTT GCC 
ATC CGA TGG CTG ATG CTT C 

Glu Gin Ala Ser Arg Tyr Ala Thr Val Ser Glu Arg Val His Ala Gin 
Val Gin Gin Phe Leu Lys Glu Gly Tyr Leu Arg Glu Glu Met Val Leu 
Asp Asn He Pro Lys Leu Leu -Asn Cys Leu Arg Asp Cys Asn Val Ala 
He Arg Trp Leu Met Leu 



CH13'2a12'1 

CTC ACA ATG GGC TAC TGG CCA ACA TAC ACG CCC ATG GAA GTG CAC TTA 
ACC CCA GAA ATG ATT AAA CTT CAG GAA GTA TTT AAG GCA TTT TAT CTT 
GGA AAG CAC AG 

Leu Thr Met Gly Tyr Trp Pro Thr Tyr Thr Pro Met Glu Val His Leu 
Thr Pro Glu Met He Lys Leu Gin Glu Val Phe Lys Ala Phe Tyr Leu 
Gly Lys His 



CH14-2a16-1 

TG TTT GTT CAC CCA AAT TGT AAA TAT GAT OCA AAG TGT ACT AAA CCA 
GAT TGT CCC TTC ACT CAT GTG AGT AGA AGA ATT CCA GTA CTG TCT CCA 
AAA CCA GTT GCA CCA CCA G 

Phe Val His Pro Asn Cys Lys Tyr Asp Ala Lys Cys Thr Lys Pro Asp 
Cys Pro Phe Thr His Val Ser Arg Arg He Pro Val Leu Ser Pro Lys 
Pro Val Ala Pro Pro 
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Figure 23(A) 



CTCAGAGAGG GCTGCCAGGA CGCGAGCCAC TGAGGAGCCG CTCAGCCAGC 
GCCATAGCCC TTAGGACTAT CGGTCACATT CTCGCGCTCC TGCTCCGGCT 
CCTCCATCTT GGCCTCGGCA GTGGCGGCTG CCGGGAGGAT GTGCCGCCTT 
CTGGCAGGGG GAAGAAGGAG GAGAAGATGA AGAA6CACCG GCGGGCCTTG 
GCCCTGGTCT CCTGCCTCTT TCTGTGCTCT CTGGTCTGGC TTCCCAGCTG 
GCGTGTATGT TGTAAAGAGA GTTCCTCAGC TTCAGCGTCA TCATATTACT 
CTCAAGATGA CAACTGCGCA CTAGAAAATG AAGATGTACA ATTCCAGAAA 
AAGAATACAG AGTCAAAAAA GTTAAGTCCA CCGGTGGTGG AGACACTCCC 
TACAGTTGAT TTGCATGAAG AGTCTTCCAA TGCAGTTGTG GACAGTGAAA 
CTGTTGAAAA TATTTCCAGC TCATCTACCT CAGAAATCAC TCCAATCTCA 
AAGCTTGATG AAATAGAAAA ATCTGGTACT ATTCCGATAG CCAAACCAAG 
TGAAACTGAG CAGTCTGAAA CTGATTGTGA TGTTGGTGAG GCCCTTGATG 
CTAGTGCTCC AATTGAACAA CCTTCCTTTG TCAGTCCACC TGACAGCCTT 
GTTGGCCAGC ATATAGAAAA TGTATCATCT TCACATGGTA AAGGAAAGAT 
AACAAAATCA GAATTTGAAT CAAAAGTTTC AGCAAGTGAA CAGGGCGGTG 
GTGATCCAAA ATCTGCATTG AATGCTTCAG ATAATTTAAA AAATQAGAGC 
TCTGATTATA CAAAACCAGG AGACATTGAC CCTACATCAG TAGGAAGTCC 
CAAAGATCCA GAAGATATAC CAACATTTGA TGAATGGAAG AAGAAAGTTA 
TGGAAGTAGA AAA^GAAAAA AGTCAGTCGA TGCATGCATC TTCTAATGGA 
GGTTCACATG CCACCAAAAA GGTCCAGAAA AATCGAAATA ATTATGCCTC 
AGTAGAATGT GGTGCCAAAA TTCTAGCAGC TAATCCAGAA GCCAAGAGCA 
CATCTGCTAT TCTTATAGAA AATATGGATC TTTACATGTT GAATCCTTGC 
AGCACTAAAA TTTGGTTTGT TATTGAACTT TGTGAACCAA TTCAAGTAAA 
ACAGCTTGAT ATTGCAAATT ATGAATTATT TTCTTCTACT CCTAAAGATT 
TTCTGGTTTC TATCAGTGAC AGATATCCAA CAAATAAGTG GATTAAGCTG 
GGTACTTTTC ATGGTAGAGA TGAGCGGAAT GTACAGAGTT TCCCTTTAGA 
TGAACAGATG TATGCAAAAT ATGTCAAGGT TGAGTTGCTA TCACATTTTG 
GATCAGAGCA CTTTTGTCCA TTAAGCCTTA TAAGGGTATT TGGCACTAAC 
ATGGTGGAAG AATATGAAGA AATTGCTGAT TCCCAGTATC ACTCAGAACG 
CCAGGAACTA TTTGATGAGG ACTATGATTA TCCACTGGAT TATAATACTG 
GAGAGGATAA ATCCTCAAAA AATCTTCTTG GTTCTGCTAC AAATGCCATT 
CTAAATATGG TGAATATTGC TGCTAATATT CTGGGAGCAA AAACTGAAGA 
CCTGACAGAA GGAAATAAAA GTATATCTGA GAATGCCACT GCCACAGCTG 
CACCTAAAAT GCCTGAATCA ACTCCTGTTT CAACTCCTGT TCCATCTCCT 
GAGTATGTAA CCACTGAAGT ACACACACAT GACATGGAGC CGTCAACACC 
AGATACTCCA AAAGAGAGTC CCATTGTACA GTTAGTTCAA GAGGAGGAAG 
AGGAGGCAAG TCCATCTACA GTGACCCTTC TGGGCAGCX3G TGAACAGGAA 
GATGAATCAT CACCCTGGTT TGAGTCAGAG ACACAAATAT TTTGCAGTGA 
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Figure 23(B) 



ACTGACCACA ATTTGTTGTA TTTCTAGTTT TTCAGAATAC ATATATAAAT 
GGTGTTCAGT TAGAGTTGCT CTTTATOGGC AGCGCAGCCG AACTGCTTTG 
AGTAAAGGAA AAGATTATCT TGTGTTAGCT CAACCACCCT TACTACTTCC 
TGCGGAATCA GTAGATGTTT CAGTATTGCA ACCTCTGAGT GGAGAATTGG 
AAAATACGAA TATAGAAAGG GAA6CTGAAA CTCTTGTTCT GGGTGATTTA 
AGTA6TAGTA TGCACCAGGA TGACTTGGTG AATCACACTG TAGATGCAGT 
TGAACTTGAA CCAAGCCATT CTCAAACTCT TTCTCAGTCT CTTCTTTTAG 
ATATTACCCC AOAAATCAAT CCCTTGCCTA AAATAGAAGT ATCTGAGTCT 
GTTGAATATG AGGCAGGACA TATACCATCA CCAGTGATTC CCCAAGAGAG 
TTCTGTTGAG ATCGATAATG AAACAGAACA AAAGTCTGAG AGCTTTAGTr 
CTATAGAOAA ACCATCTATT ACCTATGAAA CAAATAAAGT TAATGAGTTA 
ATGGATAATA TTATAAAAGA AGATATGAAC TCCATGCAAA TTTTCACAAA 
GCTGTCTGAA ACAATAGTGC CACCAATAAA TACAGCCACT GTACCCGACA 
ATGAAGATGG GGAAGCC3«A ATGAATATAQ CTGACACAGC AAAGCAAACT 
TTGATTTCTG TTGTGGATTC TTCTTCATTA CCTGAAGTAA AAGAAGAAGA 
ACAGTCTCCA GAAGATGCCC TTTTGAGAGG GTTACAGAGG ACAGCTACAG 
ATTTTTATGC TGftATTGCAA AATTCTACAG ATCTAG6ATA TGCTAATGGA 
AATCTTGTAC ATGGATCAAA CCAAAAGGAG TCAGTATTTA TGAGACTTAA 
TAATCGTATT AAAGCCTTAG AAGTTAACAT GTCTCTCAGT GGTCGCTATC 
TGGAGGAGCT TAOCCAAAGG TACCGAAAAC AAATGGAAGA AATGCAAAAG 
GCTTTCAACA AAACAATCGT GAAACTTCAG AATACTTCAA GAATAGCAGA 
GGAGCAGGAT CAGCGGCAAA CTGAAGCCAT CCAGTTGCTA CAGGCACAGC 
TGACCAACAT GACACAGCTT GTTTCAAATT TATCAGCAAC AGTAGCAGAA 
T TGAA ACGGG AGGTTTCAGA TCGACAAAQC TATCTTGTCA TATCTTTGGT 
TCTTTGTGTT GTCTTGGGAC TGATGCTTTG TATGCAGCGT TGTCGAAATA 
CTTCTCAATT TGATGGAGAT TATATTTCAA AACTTCCTAA AAGTAATCAG 
TATCCAAGCC CTAAAAGGTG TTTCTCTTCC TATGATGATA TGAATTTGAA 
AAGAAGAACT TCATTCCCAC TCATGAGATC CAAGTCTCTA CAGTTAACTG 
GCAAAGAAGT AGACCCAAAT GATTTGTACA TTGTAGAACC CCTCAAGTTT 
TCTCCA6AAA AGAAGAAGAA GCGCTGCAAG TACAAAATTG AAAAAATTGA 
GACCATAAAG CCTGAAGAAC CATTGCACCC CATAGCCAAT GGCGACATAA 
AAGGAAGAAA GCCCTTTACG AACCAGAGAG ATTTTTCTAA TATGGGAQAA 
GTTTATCACT CTTCTTATAA AGGTCCTCCA TCTGAAGGAA GCTCAGAAAC 
TTCATCACAG TCftSAAGAGT CCTATTTTTG TGGCATTTCA GCTTGCACAA 
GTCTGTGCAA TGGACAGTCT CAAAAGACAA AAACTGAGAA GAGGGCTTTA 
AAACGAAGAC GATCTAAAGT CCAAGACCAA GGAAAATTGA TAAAAACTCT 
AATACAGACT AAOTCS3GGAT CATTGCC6AG CCTGCATGAC ATAATCAAAG 
GAAACAAAQA GATCACCGTQ GGAACATTTG GTGTTACAGC AGTCTCXSGGA 
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Figure 2 3( C ) 



CATATCTAAA ATTAATTGAA CTTTTCATAC AGAAGACTTT TTTGTTGTTG 
TTCTTTGAAG AACAGTCTGT AGTATTTGAA GGGTTTGGGG GAGGGAGAAA 
ATATTAATGG GAAAGGCATT CAGAAATTAT GGTTTCTACC TTTTTAAAAA 
GTAGATGGGA TTGTGCTCAA TCTTGGTTAA TGAGCTACAG TTTTACAAAG 
CTGATCACTT CCTATAAGGA CAATGGTA6A CATTTTATAA AGATGTTTTT 
TCACAAGATT AATTACTGGG ACAAAAGTAA TTTGGAAGCC CAGTTCCTTA 
GGTGGGATAG GAATGAAAGC CTAAACCTCT TCCTTTAGCT TTGTTCCTAT 
TTCTTGCACC TTCCCATATT TATGTGCCTT TTGTCTATTT ATAATGCCAC 
TGGAAGAGGA GGGATAACTT TTTCTGTTAT TTGATTTCTT TTATAACTTT 
GTTAGGTTTT TGAAGCTGCA AACACTACAA TGCTTTGAGG GGGTCTGTGC 
CTGAAGCTCA GGAGTGTGGA TCAGACAGTC TAAAGATCCT AAAAACTTGC 
CAACTGGATC TTTGTTTAGC AAACTCACTG GAAATGAACA CTTAATGGAA 
TTTTTAAGTC TGTTCTGTTA GGTAGATGGT GATGCTCTTG TTATTTTCAC 
TTATTCAGGC TGGATTACTT CTTACTTAGT TACTAACTCA ATGAGGAAAA 
AATCCCTACA GGATCTTTTT TTGCAAACAA CTGATATATG CAGACAAATT 
TTTGACAAAT TCACCTTTTA AACACGACX5T TAACCGATTT GTGAAGGTTT 
TCTTTAGCTT ACATTTTAAA CATACACAAT AAACACTAAT CCTCCAAACT 
TTCACTGTTT TTATTAGTAT GAATATAAAA TTTGAAGGTT TGGCCAATTA 
GTACAAGTCT CATGATATAA TCACAGCCTG CATACATATG CACAGATCCA 
GTTAGTGA6T TTGTCAAGCT TAATCTAATT GGTTAAGTCT AAAGAGATTA 
TTATTCCTTG ATGTTTGCTT TGTATTGGCT ACAAATGTGC AGAGGTAATA 
CATATGTGAT GTCGATGTCT CTGTCTTTTT TTTTGTCTTT AAAAAATAAT 
TGGCAGCAAC TGTATTTGAA TAAAATGATT TCTTAGTATG ATTGTACAGT 
AATGAATGAA AGTGGAACAT GTTTCTTTTT GAAAGGGAGA GAATTGACCA 
TTTATTGTTG TGATGTTTAA GTTATAACTT ATTGAGCACT TTTAGTAGTG 
ATAACTGTTT TTAAACTTGC CTAATACCTT TCTTGGGTAT TGTTTGTAAT 
GTGACTTATT TAACGCCTTC TTTGTTTGTT TAAGTTGCTG CTTTAGGTTA 
ACAGCGTGTT TTAGAAGATT TAAATTTCTT TCCTGTCTGC ACAATTAGCT 
ATTCAGAGCA AGAGGGCCTG ATTTTATAGA AGCCCCTTGA AAAGAGGTCC 
AGATGAGAGC AGAGATACAG TGAGAAATTA TGTGATCTGT GTGTTGTGGG 
AAGAGAATTT TCAATATGTA ACTACGGAGC TGTAGTGCCA TTAGAAACTG 
TGAATTTCCA AATAAATCTG AACACTTGTC TTTATT 
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Figure 24 



QRGLPGREPL RSRSASAIAL RTIGHILAIi LRLLHLGLGS GGCRBDVPPS 
GRGKKEEKMK KHRRALALVS CLFLCSLVWL PSWRVCCKBS SSASASSYYS 
QDDNCALENE DVQFQKKNTE SKKLSPPWE TLPTVDLHEB SSNAWDSET 
VENISSSSTS EITPISKLDE lEKSGTIPIA KPSETEQSET DCDVGEALDA 
SAPIEQPSPV SPPDSLVGQH lENVSSSHGK GKITKSEFBS KVSASEQGGG 
DPKSALNASD NLKNESSDYT KPGDIDPTSV ASPKDPEDIP TFDEWKKKVM 
EVEKEKSQSM HASSNGGSHA TKKVQKNRNN YASVECGAKI LAANPEAKST 
SAILIENMDL YMLNPCSTKI WPVIELCBPI QVKQLDIANY ELPSSTPKDF 
LVSISDRYPT NKWIKLGTFH GRDERNVQSF PLDEQMYAKY VKVEI.LSHPG 
SEHFCPLSLI RVFGTNMVEE YEEIADSQYH SERQELFDED YDYPLDYNTG 
EDKSSKNLLG SATNAILNMV NIAANILQAK TEDLTBGMKS ISENATATAA 
PKMPESTPVS TPVPSPEYVT TEVHTHDMEP STPDTPKESP IVQLVQEEEE 
EASPSTVTIiL 6SGEQEDESS PWFESETQIP CSELTTICCI SSFSEYIYKW 
CSVRVALYRQ RSRTALSKGK DYLVLAQPPL LLPAESVDVS VLQPLSGELE 
NTNIEREAET WLGDLSSSM HQDDLVNHTV DAVELEPSHS QTLSQSLLLD 
ITPEINPLPK lEVSESVEYE AGHIPSPVIP QESSVEIDNE TEQKSESPSS 
lEKPSITYET NKVHELhTONI IKEDMMSMQI FTKLSBTIVP PINTATVPDN 
EDGEAKMNIA DTAKQTLISV VDSSSLPEVK EEEQSPEDAL LRGLQRTATD 
FYAELQNSTD LGYANGNLVH GSNQKESVFM RLNNRIKALE VNMSLSGRYL 
EELSQRYRKQ MEEMQKAFNK TIVKLQNTSR lAEEQDQRQT EAIQLLQAQL 
TNMTQLVSNL SATVAELKRE VSDRQSYLVI SLVLCWLGL MLCMQRCRNT 
SQFDGDYISK LPKSNQYPSP KRCFSSYDDM NLKRRTSFPL MRSKSLQLTG 
KEVDPNDLYI VEPLKFSPBK KKKRCKYKIE KIETIKPEEP LHPIANGDIK 
GRKPFTNQRD FSNMGEVYHS SYKGPPSEGS SETSSQSEES YFCGISACTS 
LCHGQSQKTK TEKRALKRRR SKVQDQGKLI KTLIQTKSGS LPSLHDIIK6 
NKEITVGTFG VTAVSGHI-N •LNFSYRRLF CCCSLKNSL« YLKGLGEGEN 
INGKGIQKLM FLPP»KVDGI VLNLG««ATV LQS»SLPIRT MVDID-RCFP 
TRLITGTKVI WKPSSLGGIG MKA»TSSFSF VPISCTPPYL CAPCLFIMPL 
EEEG-LFLLF DFFYNFVRFL KLQTLQCPEG VCA«SSGVWI RQSKDPKNLP 
TGSLFSKLTG NEHLMEFLSL FC^VDGDALV IFTYSGWITS YLVTNSMRKK 
SLQDLFLCyrr DICRQIFDKF TF»TRR«PIC EGFli»LTF»T YTINTNPPNF 
HCFY»YEYKI •RFGQLVQVS •YNHSLHTYA QIQLVSLSSL I-LVKSKEIl 
IP-CLLCIGY KCABVIHM-C RCLCLFFCL* KIIGSNCLI K»PLSMIVQ« 
•MKVEHVSP* KGEN^PPIW MFKL^LIEHF ••••LFLNLP NTFI^YCL«C 
DLFNAFFVCL SCCFRLTACF RRFKFLSCLH N«LPRARGPD FIEAP^KBVQ 
MRAE1Q*EIM •SVCCGKRIF NM«LRSCSAI RNCEPPNKSE HLSL 
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Figure 25(A) 



TAGAATTCAG CGGCCGCTGA ATTCTAGCTG CGGGGTAGGA GTCCGCGGCA 
GCCTCCGGGT AAGCCAAGCG CCGCGCA6TG CTGAGTTCCC GCACGCCGCA 
GA6CCATGGA GATCGGCACC GAGACCAGCC GCAAGATCCG GAGTGCCATT 
AAGGGGAAAT TACAAGAATT AGGAGCTTAT GTTGATGAAG AACTTCCTGA 
TTACATTATG GTGATGGTGQ CCAACAAGAA AAGTCAGGAC CAAATGACAG 
AGGATCTGTC CCTGTTTCTA GGGAACAACA CAATTCGATT CACCGTATGG 
CTTCATGGTG TATTAGATAA ACTTCGCTCT GTTACAACTG AACCCTCTAG 
TCTGAAGTCT TCTGATACCA ACATCTTTGA TAGTAACGTG CCTTCAAACA 
AGAACAATTT CAGTCGGGGA GATGAGAGGA GGCATGAAGC TGCAGTGCCA 
CCACTTGCCA TTCCTAGCGC GAGACCTGAA AAAAGAGATT CCAGAGTTTC 
TACAAGTTCG CAGGAGTCAA AAACCACAAA TGTCAGACAG ACTTACGATG 
ATGGAGCTGC AACCCGACTA ATGTCAACAG TGAAACCTTT GAGGGAGCCA 
GCACCCTCTG AAGATGTGAT TGATATTAAG CCAGAACCAG ATGATCTCAT 
TGACGAAGAC CTCAACTTTG TGCAGGAGAA TCCCTTATCT CAGAAAGAAC 
CTACAGTGAC ACTTACATAT GGTTCTTCTC GCCCTTCTAT TGAAATTTAT 
CGACCACCTG CAAGTAGAAA TGCAGATAGT GGTGTTCATT TAAACAGGTT 
GCAATTTCAA CAGCAGCAGA ATAGTATTCA TGCTGCCAAG CAGCTTGATA 
TGCAGAGTAG TTGGGTATAT GAAACAGGAC GTTTGTGTGA ACCAGAGGTG 
CTTAACAGCT TAGAAGAAAC GTATAGTCCG TTCTTTAGAA ACAACTCGGA 
GAAAATGAGT ATGGAGGATG AAAACTTTCG GAAGAGAAAG TTGCCTGTGG 
TAAGTTCAGT TGTTAAAGTA AAAAAATTCA ATCATGATGG AGAAGAGGAG 
GAAGfiAGATG ATGATTACGG GTCTCX3AACA GGAAGCATCT CCAGCAGTGT 
GTCTGTGCCT GCAAAGCCTG AAAGGAGACC TTCTCTTCCA CCTTCTAAAC 
AAGCTAACAA GAATCTGATT TTGAAGGCTA TATCTGAAGC TCAAGAATCC 
GTAACAAAAA CAACTAACTA CTCTACAGTT CCACAGAAAC AGACACTTCC 
AGTTGCTCCC AGAACTCGAA CTTCTCAAGA AGAATTGCTA GCAGAAGTGG 
TCCAGGGACA AAGTAGGACC CCCAGAATAA GTCCCCCCAT TAAAGAAGAG 
GAAACAAAAG GAGATTCTGT AGAAAAAAAT CAAGCTGAGA TGAGTGAACT 
GAGTGTGGCA CAGAAACCAG AAAAACTTTT GGAGCGCTGC AAGTACTGGC 
CTGCTTGTAA AAATGGGGAT GAGTGTGCCT ACCATCACCC CATCTCACCC 
TGCAAAGCCT TCCCCAATTG TAAATTTGCT GAAAAATGTT TGTTTGTTCA 
CCCAAATTGT AAATATGATG CAAAGTGTAC TAAACCAGAT TGTCCCTTCA 
CTCATGTGAG TAGAAGAATT CCAGTACTGT CTCCAAAACC AGTTGCACCA 
CCAGCACCAC CTTCCAGTAG TCAGCTCTGC CGTTACTTCC CTGCTTGTAA 
GAAGATGGAA TGTCCCTTCT ATCATCCAAA ACATTGTAGG TTTAACACTC 
AATGTACAAG TCCGGACTGC ACATTCTACC ATCCCACCAT TAATGTCCCA 
CCACGACATG CCTTGAAATG 6ATTCGACCT CAAACCAGCG AATAGCACCC 
AGTCCTGCCT GGCAGAAGAT CATGCAGTTT GGAAGTTTTC ATGTACTGAT 
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Figure 25(B) 



GAAAGATACT CTACAGAACT TGTCAAATCT TTGAAACTTG GAATATATTG 
CTTTCATAAT ATGAAGTTTT ATTGCCTATC TATCTGAAGT GTCTAATTTT 
TCAAGTTTGT AAGTTTATTA TGTGGTTTTA ACATTGG6TG TTTTTGTTTT 
GTTTTTACTA TGAAAAGACA GCTTAAGGAA GAGCTAAATT CTGTTAAAAT 
ATTTGGGGCA TGTTTGTGCA CTGCTGTTGT GA6GATCAGC ATATGAAATT 
GACATCATGG TTAGTCATGG TACTGCAGCT TAGGGGGCTA CACGGTTGCT 
GTGTGAGTGG AGAGATGCAG TGAGGCAGTT GTCATTATTC TAAAAATTGT 
ACTACTTTCA CTTTTCCCAA AGATTATATA ATGTTCATAA TCCACCATGA 
AAACAGCATT GGCCAAAGGT ACTGAGGCTG CTTAAAATAT TCAATTCTGC 
TTTTTAATTT TTAAGTGAAT TTAGTTTGAA AAGCATGATT ATACAGGCCT 
CTCAGGCTGA GTGCTACTTT CGGTAAAGTT CCAGTTTTCC TGCCTTCTGT 
GACAGGATGA ATGAGGTGGG TATGGACAGT GGAGGCAGCT GGAATGGCAA 
GTGCAGAAAA TAGGAACAGT TCTATACAGT GCTCTCATTT ACTAATAACA 
TAATGCCTTC TAAATAATTT TTTTGGGAAA CTACATTATC ACAAAATTAT 
ACAAATTTTT TTACAAGTAT TTACATACTG TATCTGAAAA CAGACTTTAA 
AGTCACAAGA TTATAAATGT ACATATGTAT TCTCACATTC TGAAAAATAA 
CATTCTCAGA ATCCACAGAA AATATACTTA GTTACTACTG AAGATAATTT 
TTGAAATGTA AAAATTAGAT TTAAATAGTA TATTTTAAAT GACAGAACTA 
TAATTACAGA GATCAGATCA GATAGGTAAA CTGCAAGATA GATAGGATGA 
AACTTTTGGC CTACTGTATT ACTTACAGAG TTTTTTTGTG TGTGGTTTTT 
AAAACTGTTA AGGCAAGAAG TGTCAAATGC TTTAGAGTTA AATAACAGAT 
CACTGATTTC AAAGACTTGG TGTATAGTGT TAAAAATTAA AGCTTAAAAG 
GTGGTTAGAA AAGTGGATTA ATGCAAAAGG GGTAATAAAG ACTGCAACAT 
TCTCAGGACC AAATTAAACT GCTAA 
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Figure 2g 



•NSAAAEF*L 
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MEIGT ETSRKIRSAI 

KGKLQELGAY VDEELPDYIM VMVANKKSQD QMTEDLSLFL GNNTIRFTVW 
LHGVLDKLRS VTTEPSSLKS SDTNIFDSNV PSNKNNFSRG DERRHEAAVP 
PLAIPSARPE KRDSRVSTSS QESKTTNVRQ TYDDGAATRL MSTVKPLREP 
APSEDVIDIK PEPDDLIDED LNFVQENPLS QKEPTVTLTY GSSRPSIEIY 
RPPASRNADS GVHLNRLQFQ QQQNSIHAAK QLDMQSSWVY ETGRLCEPEV 
LNSLEETYSP FFRNNSEKMS MEDENFRKRK LPWSSWKV KKFNHDGEEE 
EGDDDYGSRT GSISSSVSVP^ AKPERRPSLP PSKQANKNLI ^LKAJSEAQES 
VTKTTNYSTV PQKQTLPVAB 1R*RT 

ETKGDSVEKN QAEMSELSVA QKI^EKiIeRC KYWPACKNGD ECAYHHPISP 
CKAFPNCKFA EKCLFVHPNC KYDAKCTKPD CPFTHVSRRI PVLSPKPVAP 
PAPPSSSQLC RYFPACKKME CPFYHPKHCR FNTQCTSPDC TFYHPTINVP 
PRHALKWIRP QTSE 
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